-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#620 Add support for shards - SolrSpout #1343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike OpenSearch, in SolrSpout we do not call markQueryReceivedNow()
. Should we add this?
yes and also set |
external/solr/src/main/java/org/apache/stormcrawler/solr/persistence/SolrSpout.java
Outdated
Show resolved
Hide resolved
external/solr/src/main/java/org/apache/stormcrawler/solr/persistence/SolrSpout.java
Show resolved
Hide resolved
componentToTasks, | ||
new HashMap<>(), | ||
null, | ||
null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing the test with the 2 spouts, I manually created this Storm TopologyContext
object with most of the parameters set to null
. Is this ok? Should we set anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea to be honest but it feels more complicated than what we've had to do for the other tests.
What about reusing TestUtil.getMockedTopologyContext()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't work, since what I wanted to test involved the SolrSpout calling context.getComponentTasks() which in turn reads the componentToTasks for example. I started from the FileSpoutTopologyContextMock and we could in principle have something similar if we want to have more such SolrSpout tests in the future.
@mvolikas Do you aim to include this PR in 3.1.1 - if so, do you think you can work on the open comments or would it be ok to move it to 3.1.2? |
From my side, a safe estimate for having this ready and tested would be the first week of November. If the 3.1.1 release is planned earlier please move this to 3.1.2. Thanks! |
I am sure we can wait. This will be a great addition to the next release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently doesn't pass the tests because of missing license headers
Running the archetype generation with
mvn archetype:generate -DarchetypeGroupId=org.apache.stormcrawler -DarchetypeArtifactId=stormcrawler-solr-archetype -DarchetypeVersion=3.1.1-SNAPSHOT
fails
Caused by: org.apache.maven.plugin.MojoFailureException: java.io.IOException: No such file or directory
at org.apache.maven.archetype.mojos.CreateProjectFromArchetypeMojo.execute (CreateProjectFromArchetypeMojo.java:216)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
external/solr/archetype/src/main/resources/archetype-resources/pom.xml
Outdated
Show resolved
Hide resolved
external/solr/archetype/src/main/resources/archetype-resources/README.md
Outdated
Show resolved
Hide resolved
@jnioche This is strange; I cannot reproduce it locally. In my case, it runs as expected by first running:
Could it be a directory permissions issue? |
archetype generated successfully, no idea why it had failed |
@mvolikas compiling the project generated from the archetype fails with
|
Yes I have not yet tested with the java topologies. One thing to note is that after generating the default StormCrawler archetype with
and then running
Can you confirm that? |
external/solr/archetype/src/main/resources/archetype-resources/src/main/java/CrawlTopology.java
Outdated
Show resolved
Hide resolved
The actually issue is, that the template misses an import for |
The fact that it has been broken for ever and no one reported it kind of suggests that hardly anyone uses the core archetype. |
The compilation fails whether you use the Java topologies or not... |
Bascially, the Java topologies are only good for testing in local mode (IMHO) and are actually only usable, if the IDE is configured to include the |
Ok, so I guess I will add this back. |
Sorry if I wasn't clear - we need a Flux for the injection, not the Java topology |
An update from my side:
Still to do/decide:
|
No worries; I will add this too. |
@mvolikas, latest comments
needs changing to
Thanks! |
Hi there! @jnioche I think I have made the changes; also pushed some comments and minor fixes to the readme files. I ran some more tests with a greater number of shards (e.g. 10), and everything seems ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked with a crawl in local mode, works fine
Thanks a lot @mvolikas, this is a great contribution
This PR (work in progress) employs the following strategy for supporting shards in
status
:status
collection to use the number of shards defined insolr-conf.yaml
(solr.status.routing.shards
) and sets the sharding field to be the same assolr.status.routing.fieldname
.Pending tasks: