-
Notifications
You must be signed in to change notification settings - Fork 60
feat(llm): property embedding #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Change-Id: I31e9532990c2e7e09ddf518ac9c35cdf996a5a25 doc: update repo & url link for inner repo Change-Id: Ieac00904c4ab793de87c1cae7324e3d2d7946884 doc: update repo & url link for inner repo (GraphPlatform-4190) doc: update the link info
…ph/hugegraph-ai into master-icode
sync Change-Id: Id2138dfecfbd377dd39ec51fa9d46b630e9ea88f
Change-Id: I8cb2139be06f63378f87bccd9035357a7cb7a91e
Change-Id: I07414cd8b418afabcb1f8a40454a679f10760bf8
Change-Id: I909b71df67bef316b8cef02010bcae9e91168f4e
Change-Id: I0d4ce8e5a227941ecb3e5972bec35a79e25a6263
Change-Id: I803b36fde6d0b0291dc6fe5cc40c37a946b354f0
Change-Id: I1e3f97afeaa853250966a66c096582339f3c51d6
Change-Id: I6107a5d63c9949ac5e47f93521aa248d0a8ec121
Change-Id: I56329c6aa219fd04198519ee1d1220e395fe82f2
Change-Id: Id93b59ca9a325f79623741d919a18d8f64c12ccd
Change-Id: I81c24fdabd84b92b7cc6e79e19ce3286700806f6
Change-Id: I799844222f194d54990b27d71770543ad6c707ca
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces property embedding support and extends existing vector indexing and query functionality. Key changes include:
- Adding property vector indexing and cleaning in the vector and graph index utilities.
- Updating embedding retrieval and query generation to support property keywords and property embeddings.
- Integrating property-related logic into semantic queries, index building, API endpoints, and prompt configuration.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py | Added property vector index loading and updated index info output. |
hugegraph-llm/src/hugegraph_llm/utils/graph_index_utils.py | Incorporated cleaning and update logging for property vector indexes. |
hugegraph-llm/src/hugegraph_llm/operators/llm_op/gremlin_generate.py | Passed an additional 'properties' parameter to LLM generation. |
hugegraph-llm/src/hugegraph_llm/operators/index_op/semantic_id_query.py | Integrated exact and fuzzy matching for property keys and values. |
hugegraph-llm/src/hugegraph_llm/operators/index_op/build_semantic_index.py | Added diffing and update logic for property embeddings with limit checks. |
hugegraph-llm/src/hugegraph_llm/operators/hugegraph_op/graph_rag_query.py | Updated gremlin queries to use property name/value filtering in fallback scenarios. |
hugegraph-llm/src/hugegraph_llm/operators/hugegraph_op/fetch_graph_data.py | Extended graph summary with extracted index label properties. |
hugegraph-llm/src/hugegraph_llm/operators/gremlin_generate_task.py | Passed property data to the gremlin generate operator. |
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py | Registered and imported the new vector API endpoint. |
hugegraph-llm/src/hugegraph_llm/config/prompt_config.py | Updated prompt content to include property references for query generation. |
hugegraph-llm/src/hugegraph_llm/api/vector_api.py | Introduced new API endpoint for updating vector embeddings with rate limits. |
Comments suppressed due to low confidence (1)
hugegraph-llm/src/hugegraph_llm/operators/hugegraph_op/graph_rag_query.py:349
- The variable 'use_id_to_match' is used without a visible definition in this context. Please ensure it is properly declared and initialized before use.
node_str = f"{item['id']}{{{props_str}}}" if use_id_to_match else f"{item['props']}{{{props_str}}}"
add_propsets.append(propset) | ||
add_prop_values.append(prop_value) | ||
if add_prop_values: | ||
if len(add_prop_values) > 100000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider refactoring the hardcoded property update limit (100000) into a configurable parameter to improve maintainability and flexibility.
Copilot uses AI. Check for mistakes.
for keyword in keywords: | ||
keyword_vector = self.embedding.get_text_embedding(keyword) | ||
results = self.vector_index.search(keyword_vector, top_k=self.topk_per_keyword, | ||
keyword_vector = self.embedding.get_texts_embeddings([keyword]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that 'get_texts_embeddings' returns a non-empty list before accessing its first element to avoid potential IndexError.
keyword_vector = self.embedding.get_texts_embeddings([keyword]) | |
keyword_vector = self.embedding.get_texts_embeddings([keyword]) | |
if not keyword_vector: # Ensure the list is non-empty | |
log.warning("No embeddings found for keyword: %s", keyword) | |
continue |
Copilot uses AI. Check for mistakes.
Change-Id: Ie99469b659abe5efd53d8a52581c4ad91622ef6f
Change-Id: I04b1a71a5eda13c8369cd9c4b99e83f6c1373405
Change-Id: I8d9d5e3a46e16162c45b1b07a05bd187ebe0f4b1
Change-Id: Ibf83629dde9373398dac75b945a7fa19ae029a08
Change-Id: I8923df363da0ddc301b4a3c4833cec478a6c83f9
Change-Id: I84b2fdbfc0745d1556d5d29059ce5f9dfa311352
Change-Id: I79ef4dc7852788823b249a80d7979e5917d2a8c0
…arhugegraph/hugegraph-ai into property_embedding
link to #228, V1 version
update, get and delete property embeddings
update_vid_embedding
(embedding will only be performed for indexed properties).clean_all_graph_index
keywords match
Match properties according to keywords (exact + fuzzy)
text2gremlin
Add properties in the prompt
subgraph_query
Give priority to executing the vid subgraph query. If it fails and properties have been matched based on keywords previously, then execute the property subgraph query.