feat: Introduce Nebula Metadata Proxy and Databuilder #1817

wey-gu · 2022-04-15T10:23:33Z

Metadata: Nebula Proxy
Databuilder:

Nebula Extractor
Nebula Search Data Extractor
Nebula CSV Loader
Nebula CSV Publisher
Nebula Serializer
Nebula Sample Data Loader

Summary of Changes

Tests

All New things were UT covered.

Documentation

docker-compose -f docker-amundsen-nebula.yml build
docker-compose -f docker-amundsen-nebula.yml up -d

cd databuilder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
python3 example/scripts/sample_data_loader_nebula.py # this is necessary to trigger schema creation

CheckList

Make sure you have checked all steps below to ensure a timely review.

PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
PR includes a summary of changes.
PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does

docker-amundsen-nebula.yml

wey-gu · 2022-05-15T08:50:34Z

note: now the search is broken in the recent rebase: ff1c42e, maybe it's related to recently merged search change, will look into it later

2022-05-15T11:16:51+0000.802 [ERROR] es_proxy_v2.execute_queries:311 (1:Thread-21) - Failed to execute ES search queries. TransportError(N/A, 'index_not_found_exception')

mgorsk1 · 2022-05-16T18:02:15Z

is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?

I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?

wey-gu · 2022-05-17T10:09:02Z

is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?

I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?

Thanks @mgorsk1 for your time to look into the proposal!

Indeed, I also had seen yet another backend storage increases the burden introducing new features during the implementation of the reference PR for the proposal, and I just told myself to keep eye on all PRs after it's merged and lift it from my own efforts then.

While, as you pointed, it doesn't scale at all, and it in big chance is a good opportunity to make cypher-based backend with some level of abstractions to share codes when possible.

I will take this context and purpose in mind and see what could be done on the refactor.

There are some challenges that nebula only support OpenCypher as a dialect and reusing query string itself isn't directly possible(see here), while the mindset to per each read functions are similar, thus, find a way to decouple cypher-speaking DB implementation from code to configurations looks possible(and worth it).

Thanks.

Metadata: Nebula Proxy Databuilder: - Nebula Extractor - Nebula Search Data Extractor - Nebula CSV Loader - Nebula CSV Publisher - Nebula Serializer - Nebula Sample Data Loader Signed-off-by: wey-gu <[email protected]>

Signed-off-by: wey-gu <[email protected]>

note: still need amundsen-io#1856 to be included to work as highlight_options introduced in frontend app but not yet merged in search service Signed-off-by: wey-gu <[email protected]>

Signed-off-by: wey-gu <[email protected]>

Golodhros · 2023-02-06T22:50:24Z

Closing as abandoned

wey-gu · 2023-02-07T01:34:43Z

thanks @Golodhros ,
When the RFC was settled, I'll reopen the PR.
amundsen-io/rfcs#48
BR//Wey

wey-gu requested review from feng-tao, jinhyukchang, allisonsuarez, verdan, bolkedebruin, mgorsk1, dorianj, youngyjd, dechoma, sewardgw, dkunitsk and a team as code owners April 15, 2022 10:23

boring-cyborg bot added area:databuilder From databuilder folder category:models labels Apr 15, 2022

wey-gu mentioned this pull request Apr 15, 2022

RFC/Feature: Nebula Graph as Backend Storage #1816

Closed

wey-gu commented Apr 15, 2022

View reviewed changes

docker-amundsen-nebula.yml Outdated Show resolved Hide resolved

feng-tao added the keep fresh Disables stalebot from closing an issue label Apr 25, 2022

wey-gu mentioned this pull request May 13, 2022

Support Nebula Graph amundsen-io/rfcs#48

Open

wey-gu force-pushed the amundsen_nebula_graph branch from 3143ad1 to ff1c42e Compare May 13, 2022 13:17

wey-gu force-pushed the amundsen_nebula_graph branch 2 times, most recently from 4235894 to d372829 Compare May 15, 2022 15:40

wey-gu force-pushed the amundsen_nebula_graph branch from d372829 to 1e3de5d Compare May 17, 2022 16:20

wey-gu added 5 commits May 24, 2022 10:05

feat: Introduce Nebula Metadata Proxy and Databuilder

e252acd

Metadata: Nebula Proxy Databuilder: - Nebula Extractor - Nebula Search Data Extractor - Nebula CSV Loader - Nebula CSV Publisher - Nebula Serializer - Nebula Sample Data Loader Signed-off-by: wey-gu <[email protected]>

Nebula Graph 3.1.0

2477e15

Signed-off-by: wey-gu <[email protected]>

Added Nebula Studio

9fe7d41

Signed-off-by: wey-gu <[email protected]>

SearchMetadatatoElasticasearchTask for elastic databuilder

eac29cf

note: still need amundsen-io#1856 to be included to work as highlight_options introduced in frontend app but not yet merged in search service Signed-off-by: wey-gu <[email protected]>

Postgres/Dbt/Superset nebula sample added

f96e7f9

Signed-off-by: wey-gu <[email protected]>

wey-gu force-pushed the amundsen_nebula_graph branch from 1e3de5d to f96e7f9 Compare May 24, 2022 02:06

Golodhros removed the category:models label Dec 15, 2022

Golodhros closed this Feb 6, 2023

This was referenced Sep 21, 2023

Fix - Adds a guard clause to prevent crashing in case of missing es indices #2189

Merged

Fix - Handle ES missing index edge case (bug fix) #2193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Introduce Nebula Metadata Proxy and Databuilder #1817

feat: Introduce Nebula Metadata Proxy and Databuilder #1817

wey-gu commented Apr 15, 2022 •

edited

Loading

wey-gu commented May 15, 2022 •

edited

Loading

mgorsk1 commented May 16, 2022 •

edited

Loading

wey-gu commented May 17, 2022 •

edited

Loading

Golodhros commented Feb 6, 2023

wey-gu commented Feb 7, 2023 •

edited

Loading

feat: Introduce Nebula Metadata Proxy and Databuilder #1817

feat: Introduce Nebula Metadata Proxy and Databuilder #1817

Conversation

wey-gu commented Apr 15, 2022 • edited Loading

Summary of Changes

Tests

Documentation

CheckList

wey-gu commented May 15, 2022 • edited Loading

mgorsk1 commented May 16, 2022 • edited Loading

wey-gu commented May 17, 2022 • edited Loading

Golodhros commented Feb 6, 2023

wey-gu commented Feb 7, 2023 • edited Loading

wey-gu commented Apr 15, 2022 •

edited

Loading

wey-gu commented May 15, 2022 •

edited

Loading

mgorsk1 commented May 16, 2022 •

edited

Loading

wey-gu commented May 17, 2022 •

edited

Loading

wey-gu commented Feb 7, 2023 •

edited

Loading