Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing Data to Elasticsearch Storage Engine #225

Merged
merged 36 commits into from
Jan 13, 2022

Conversation

slhsxcmy
Copy link
Contributor

@slhsxcmy slhsxcmy commented Mar 24, 2021

What changes were proposed in this pull request?

We have implemented the Factory Pattern to extract storage components (Solr and Elasticsearch) from Sparkler implementation. Currently, classes for Elasticsearch are placeholders and we are starting to implement those classes. We are also testing to make sure Solr can still run with the Factory.

We moved Solr related classes into sparkler-app/src/main/scala/edu/usc/irds/sparkler/storage/solr, including the original MemexDeepCrawlDbRDD and MemexCrawlDbRDD, and renamed them to SolrDeepRDD and SolrRDD to reflect their usage on Solr. Let us know if you think the naming convention deviates from the purpose and if we should change it again.

Is this related to an already existing issue on sparkler?
#224
#229

slhsxcmy and others added 4 commits March 6, 2021 11:14
Using Either[SolrClient,RestHighLevelClient] leads to "Overriding type String => SolrClient does not conform to base type String => Either[SolrClient, RestHighLevelClient]"

Co-Authored-By: Kevin Yan <[email protected]>
Extract SolrRDD and SolrDeepRDD; Cast getClient() result to SolrClient in 2 RDDs and SolrUpsert; Add getRDD and getDeepRDD to StorageProxyFactory; Add 3 add resource methods to StorageProxy and cast parameter to SolrInputDocument in SolrProxy; Add 2 dummy Elasticsearch RDDs
@Kefaun2601 Kefaun2601 marked this pull request as draft March 24, 2021 16:50
@Kefaun2601
Copy link
Contributor

@lewismc

We’ve been duplicating the *RDD.scala files (in this directory) and modifying them into Elasticsearch variants. Just confirming, is this the correct approach?

NOTE: this is still highly a work in progress. We would just like to confirm that we're working in the right direction and see if you have any suggestions. Thanks.

Copy link
Member

@lewismc lewismc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great folks. Please see my comments ans keep up the good work.

Kefaun2601 and others added 21 commits April 2, 2021 13:20
[Docker] Update run script with relative paths and docker-compose file
Co-authored-by: Felix Loesing <[email protected]>
Co-authored-by: Mingyu Cui <[email protected]>
Co-authored-by: Nikhil Handyal <[email protected]>
Co-authored-by: Miles Phan <[email protected]>
Co-authored-by: Mingyu Cui <[email protected]>
…pdatetransformer classes. Update crawler as well
Co-authored-by: Felix Loesing <[email protected]>
Co-authored-by: Mingyu Cui <[email protected]>
Co-authored-by: Felix Loesing <[email protected]>
Co-authored-by: Mingyu Cui <[email protected]>
Co-authored-by: Miles Phan <[email protected]>
Co-authored-by: Nikhil Handyal <[email protected]>
@buggtb buggtb changed the base branch from master to mvn2sbt January 13, 2022 16:43
@buggtb buggtb changed the base branch from mvn2sbt to master January 13, 2022 16:46
@buggtb buggtb changed the title WIP: Writing Data to Elasticsearch Storage Engine Writing Data to Elasticsearch Storage Engine Jan 13, 2022
@buggtb buggtb marked this pull request as ready for review January 13, 2022 16:47
@buggtb
Copy link
Collaborator

buggtb commented Jan 13, 2022

I need ES support and I need to merge this into the mainline due to Github giving dodgy merge instructions and I'd rather not lose it. So I'm going to merge this in, then sync it with my mammoth mvn2sbt dev branch, clean up the integration and then merge the whole lot back into master

@buggtb buggtb merged commit cbd31b1 into USCDataScience:master Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants