Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce Nebula Metadata Proxy and Databuilder #1817

Closed
wants to merge 5 commits into from

Conversation

wey-gu
Copy link

@wey-gu wey-gu commented Apr 15, 2022

Metadata: Nebula Proxy
Databuilder:

  • Nebula Extractor
  • Nebula Search Data Extractor
  • Nebula CSV Loader
  • Nebula CSV Publisher
  • Nebula Serializer
  • Nebula Sample Data Loader

Summary of Changes

see #1816

Tests

All New things were UT covered.

Documentation

docker-compose -f docker-amundsen-nebula.yml build
docker-compose -f docker-amundsen-nebula.yml up -d

cd databuilder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
python3 example/scripts/sample_data_loader_nebula.py # this is necessary to trigger schema creation

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

docker-amundsen-nebula.yml Outdated Show resolved Hide resolved
@feng-tao feng-tao added the keep fresh Disables stalebot from closing an issue label Apr 25, 2022
@wey-gu wey-gu force-pushed the amundsen_nebula_graph branch from 3143ad1 to ff1c42e Compare May 13, 2022 13:17
@wey-gu
Copy link
Author

wey-gu commented May 15, 2022

note: now the search is broken in the recent rebase: ff1c42e, maybe it's related to recently merged search change, will look into it later

2022-05-15T11:16:51+0000.802 [ERROR] es_proxy_v2.execute_queries:311 (1:Thread-21) - Failed to execute ES search queries. TransportError(N/A, 'index_not_found_exception')

@wey-gu wey-gu force-pushed the amundsen_nebula_graph branch 2 times, most recently from 4235894 to d372829 Compare May 15, 2022 15:40
@mgorsk1
Copy link
Contributor

mgorsk1 commented May 16, 2022

is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?

I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?

@wey-gu
Copy link
Author

wey-gu commented May 17, 2022

is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?

I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?

Thanks @mgorsk1 for your time to look into the proposal!

Indeed, I also had seen yet another backend storage increases the burden introducing new features during the implementation of the reference PR for the proposal, and I just told myself to keep eye on all PRs after it's merged and lift it from my own efforts then.

While, as you pointed, it doesn't scale at all, and it in big chance is a good opportunity to make cypher-based backend with some level of abstractions to share codes when possible.

I will take this context and purpose in mind and see what could be done on the refactor.

There are some challenges that nebula only support OpenCypher as a dialect and reusing query string itself isn't directly possible(see here), while the mindset to per each read functions are similar, thus, find a way to decouple cypher-speaking DB implementation from code to configurations looks possible(and worth it).

Thanks.

@wey-gu wey-gu force-pushed the amundsen_nebula_graph branch from d372829 to 1e3de5d Compare May 17, 2022 16:20
wey-gu added 5 commits May 24, 2022 10:05
Metadata: Nebula Proxy
Databuilder:
- Nebula Extractor
- Nebula Search Data Extractor
- Nebula CSV Loader
- Nebula CSV Publisher
- Nebula Serializer
- Nebula Sample Data Loader

Signed-off-by: wey-gu <[email protected]>
Signed-off-by: wey-gu <[email protected]>
Signed-off-by: wey-gu <[email protected]>
note: still need amundsen-io#1856 to be included to work
as highlight_options introduced in frontend app
but not yet merged in search service

Signed-off-by: wey-gu <[email protected]>
@wey-gu wey-gu force-pushed the amundsen_nebula_graph branch from 1e3de5d to f96e7f9 Compare May 24, 2022 02:06
@Golodhros
Copy link
Member

Closing as abandoned

@Golodhros Golodhros closed this Feb 6, 2023
@wey-gu
Copy link
Author

wey-gu commented Feb 7, 2023

thanks @Golodhros ,
When the RFC was settled, I'll reopen the PR.
amundsen-io/rfcs#48
BR//Wey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:databuilder From databuilder folder keep fresh Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants