Skip to content

Conversation

MrJs133
Copy link
Contributor

@MrJs133 MrJs133 commented May 19, 2025

link to #228, V1 version

update, get and delete property embeddings

  • Added property embedding updates to update_vid_embedding (embedding will only be performed for indexed properties).
    • Adopt incremental updates (support adding and deleting prop_value, as well as adding and deleting prop_key). If the number of properties to be updated exceeds 100,000, a warning will be issued, and the update will be rejected.
  • Added property embedding removes to clean_all_graph_index

keywords match

Match properties according to keywords (exact + fuzzy)

text2gremlin

Add properties in the prompt

subgraph_query

Give priority to executing the vid subgraph query. If it fails and properties have been matched based on keywords previously, then execute the property subgraph query.

Jin and others added 25 commits January 2, 2025 19:24
Change-Id: I31e9532990c2e7e09ddf518ac9c35cdf996a5a25

doc: update repo & url link for inner repo

Change-Id: Ieac00904c4ab793de87c1cae7324e3d2d7946884

doc: update repo & url link for inner repo (GraphPlatform-4190)

doc: update the link info
sync

Change-Id: Id2138dfecfbd377dd39ec51fa9d46b630e9ea88f
Change-Id: I8cb2139be06f63378f87bccd9035357a7cb7a91e
Change-Id: I07414cd8b418afabcb1f8a40454a679f10760bf8
Change-Id: I909b71df67bef316b8cef02010bcae9e91168f4e
Change-Id: I0d4ce8e5a227941ecb3e5972bec35a79e25a6263
Change-Id: I803b36fde6d0b0291dc6fe5cc40c37a946b354f0
Change-Id: I1e3f97afeaa853250966a66c096582339f3c51d6
Change-Id: I6107a5d63c9949ac5e47f93521aa248d0a8ec121
Change-Id: I56329c6aa219fd04198519ee1d1220e395fe82f2
Change-Id: Id93b59ca9a325f79623741d919a18d8f64c12ccd
Change-Id: I81c24fdabd84b92b7cc6e79e19ce3286700806f6
Change-Id: I799844222f194d54990b27d71770543ad6c707ca
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label May 19, 2025
@github-actions github-actions bot added the llm label May 19, 2025
@dosubot dosubot bot added the enhancement New feature or request label May 19, 2025
@dosubot dosubot bot removed the size:M This PR changes 30-99 lines, ignoring generated files. label May 20, 2025
@imbajin imbajin requested a review from Copilot May 26, 2025 11:15
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces property embedding support and extends existing vector indexing and query functionality. Key changes include:

  • Adding property vector indexing and cleaning in the vector and graph index utilities.
  • Updating embedding retrieval and query generation to support property keywords and property embeddings.
  • Integrating property-related logic into semantic queries, index building, API endpoints, and prompt configuration.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py Added property vector index loading and updated index info output.
hugegraph-llm/src/hugegraph_llm/utils/graph_index_utils.py Incorporated cleaning and update logging for property vector indexes.
hugegraph-llm/src/hugegraph_llm/operators/llm_op/gremlin_generate.py Passed an additional 'properties' parameter to LLM generation.
hugegraph-llm/src/hugegraph_llm/operators/index_op/semantic_id_query.py Integrated exact and fuzzy matching for property keys and values.
hugegraph-llm/src/hugegraph_llm/operators/index_op/build_semantic_index.py Added diffing and update logic for property embeddings with limit checks.
hugegraph-llm/src/hugegraph_llm/operators/hugegraph_op/graph_rag_query.py Updated gremlin queries to use property name/value filtering in fallback scenarios.
hugegraph-llm/src/hugegraph_llm/operators/hugegraph_op/fetch_graph_data.py Extended graph summary with extracted index label properties.
hugegraph-llm/src/hugegraph_llm/operators/gremlin_generate_task.py Passed property data to the gremlin generate operator.
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py Registered and imported the new vector API endpoint.
hugegraph-llm/src/hugegraph_llm/config/prompt_config.py Updated prompt content to include property references for query generation.
hugegraph-llm/src/hugegraph_llm/api/vector_api.py Introduced new API endpoint for updating vector embeddings with rate limits.
Comments suppressed due to low confidence (1)

hugegraph-llm/src/hugegraph_llm/operators/hugegraph_op/graph_rag_query.py:349

  • The variable 'use_id_to_match' is used without a visible definition in this context. Please ensure it is properly declared and initialized before use.
node_str = f"{item['id']}{{{props_str}}}" if use_id_to_match else f"{item['props']}{{{props_str}}}"

add_propsets.append(propset)
add_prop_values.append(prop_value)
if add_prop_values:
if len(add_prop_values) > 100000:
Copy link
Preview

Copilot AI May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider refactoring the hardcoded property update limit (100000) into a configurable parameter to improve maintainability and flexibility.

Copilot uses AI. Check for mistakes.

for keyword in keywords:
keyword_vector = self.embedding.get_text_embedding(keyword)
results = self.vector_index.search(keyword_vector, top_k=self.topk_per_keyword,
keyword_vector = self.embedding.get_texts_embeddings([keyword])
Copy link
Preview

Copilot AI May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that 'get_texts_embeddings' returns a non-empty list before accessing its first element to avoid potential IndexError.

Suggested change
keyword_vector = self.embedding.get_texts_embeddings([keyword])
keyword_vector = self.embedding.get_texts_embeddings([keyword])
if not keyword_vector: # Ensure the list is non-empty
log.warning("No embeddings found for keyword: %s", keyword)
continue

Copilot uses AI. Check for mistakes.

imbajin and others added 12 commits May 26, 2025 19:46
Change-Id: Ie99469b659abe5efd53d8a52581c4ad91622ef6f
Change-Id: I04b1a71a5eda13c8369cd9c4b99e83f6c1373405
Change-Id: I8d9d5e3a46e16162c45b1b07a05bd187ebe0f4b1
Change-Id: Ibf83629dde9373398dac75b945a7fa19ae029a08
Change-Id: Idb8c1476ebda521e195022f14f6733a33288f552
Change-Id: I8923df363da0ddc301b4a3c4833cec478a6c83f9
Change-Id: I84b2fdbfc0745d1556d5d29059ce5f9dfa311352
Change-Id: I79ef4dc7852788823b249a80d7979e5917d2a8c0
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request llm python-client size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants