Skip to content

Commit 616eb08

Browse files
prakriti-solankeykartikpersistentvasanthasaikalluri
authored
Dev To STAGING (#532)
* format fixes and graph schema indication fix * Update README.md * added chat modes variable in env updated the readme * spell fix * added the chat mode in env table * added the logos * fixed the overflow issues * removed the extra fix * Fixed specific scenario "when the text from schema closes it should reopen the previous modal" * readme changes * removed dev console logs * added new retrieval query (#533) * format fixes and tab rendering fix * fixed the setting modal reopen issue --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]>
1 parent e566862 commit 616eb08

26 files changed

+241
-100
lines changed

README.md

+15-2
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ DIFFBOT_API_KEY="your-diffbot-key"
4040

4141
if you only want OpenAI:
4242
```env
43-
LLM_MODELS="gpt-3.5,gpt-4o"
43+
LLM_MODELS="diffbot,openai-gpt-3.5,openai-gpt-4o"
4444
OPENAI_API_KEY="your-openai-key"
4545
```
4646

@@ -70,6 +70,18 @@ GOOGLE_CLIENT_ID="xxxx"
7070

7171
You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove any you don't want/need.
7272

73+
### Chat Modes
74+
75+
By default,all of the chat modes will be available: vector, graph+vector and graph.
76+
If none of the mode is mentioned in the chat modes variable all modes will be available:
77+
```env
78+
CHAT_MODES=""
79+
```
80+
81+
If however you want to specifiy the only vector mode or only graph mode you can do that by specifying the mode in the env:
82+
```env
83+
CHAT_MODES="vector,graph+vector"
84+
```
7385

7486
#### Running Backend and Frontend separately (dev environment)
7587
Alternatively, you can run the backend and frontend separately:
@@ -134,7 +146,8 @@ Allow unauthenticated request : Yes
134146
| BACKEND_API_URL | Optional | http://localhost:8000 | URL for backend API |
135147
| BLOOM_URL | Optional | https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true | URL for Bloom visualization |
136148
| REACT_APP_SOURCES | Optional | local,youtube,wiki,s3 | List of input sources that will be available |
137-
| LLM_MODELS | Optional | diffbot,gpt-3.5,gpt-4o | Models available for selection on the frontend, used for entities extraction and Q&A Chatbot |
149+
| LLM_MODELS | Optional | diffbot,openai-gpt-3.5,openai-gpt-4o | Models available for selection on the frontend, used for entities extraction and Q&A
150+
| CHAT_MODES | Optional | vector,graph+vector,graph | Chat modes available for Q&A
138151
| ENV | Optional | DEV | Environment variable for the app |
139152
| TIME_PER_CHUNK | Optional | 4 | Time per chunk for processing |
140153
| CHUNK_SIZE | Optional | 5242880 | Size of each chunk of file for upload |

backend/src/QA_integration_new.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,7 @@ def QA_RAG(graph, model, question, document_names,session_id, mode):
322322
if mode == "graph":
323323
graph_chain, qa_llm,model_version = create_graph_chain(model,graph)
324324
graph_response = get_graph_response(graph_chain,question)
325-
ai_response = AIMessage(content=graph_response["response"])
325+
ai_response = AIMessage(content=graph_response["response"]) if graph_response["response"] else AIMessage(content="Something went wrong")
326326
messages.append(ai_response)
327327
summarize_and_log(history, messages, qa_llm)
328328

@@ -342,7 +342,7 @@ def QA_RAG(graph, model, question, document_names,session_id, mode):
342342
elif mode == "vector":
343343
retrieval_query = VECTOR_SEARCH_QUERY
344344
else:
345-
retrieval_query = VECTOR_GRAPH_SEARCH_QUERY
345+
retrieval_query = VECTOR_GRAPH_SEARCH_QUERY.format(no_of_entites=VECTOR_GRAPH_SEARCH_ENTITY_LIMIT)
346346

347347
llm, doc_retriever, model_version = setup_chat(model, graph, session_id, document_names,retrieval_query)
348348

backend/src/shared/constants.py

+93-33
Original file line numberDiff line numberDiff line change
@@ -111,38 +111,102 @@
111111
# """
112112

113113

114+
# VECTOR_GRAPH_SEARCH_QUERY = """
115+
# WITH node as chunk, score
116+
# // find the document of the chunk
117+
# MATCH (chunk)-[:PART_OF]->(d:Document)
118+
# // fetch entities
119+
# CALL { WITH chunk
120+
# // entities connected to the chunk
121+
# // todo only return entities that are actually in the chunk, remember we connect all extracted entities to all chunks
122+
# MATCH (chunk)-[:HAS_ENTITY]->(e)
123+
124+
# // depending on match to query embedding either 1 or 2 step expansion
125+
# WITH CASE WHEN true // vector.similarity.cosine($embedding, e.embedding ) <= 0.95
126+
# THEN
127+
# collect { MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,1}(:!Chunk&!Document) RETURN path }
128+
# ELSE
129+
# collect { MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,2}(:!Chunk&!Document) RETURN path }
130+
# END as paths
131+
132+
# RETURN collect{ unwind paths as p unwind relationships(p) as r return distinct r} as rels,
133+
# collect{ unwind paths as p unwind nodes(p) as n return distinct n} as nodes
134+
# }
135+
# // aggregate chunk-details and de-duplicate nodes and relationships
136+
# WITH d, collect(DISTINCT {chunk: chunk, score: score}) AS chunks, avg(score) as avg_score, apoc.coll.toSet(apoc.coll.flatten(collect(rels))) as rels,
137+
138+
# // TODO sort by relevancy (embeddding comparision?) cut off after X (e.g. 25) nodes?
139+
# apoc.coll.toSet(apoc.coll.flatten(collect(
140+
# [r in rels |[startNode(r),endNode(r)]]),true)) as nodes
141+
142+
# // generate metadata and text components for chunks, nodes and relationships
143+
# WITH d, avg_score,
144+
# [c IN chunks | c.chunk.text] AS texts,
145+
# [c IN chunks | {id: c.chunk.id, score: c.score}] AS chunkdetails,
146+
# apoc.coll.sort([n in nodes |
147+
148+
# coalesce(apoc.coll.removeAll(labels(n),['__Entity__'])[0],"") +":"+
149+
# n.id + (case when n.description is not null then " ("+ n.description+")" else "" end)]) as nodeTexts,
150+
# apoc.coll.sort([r in rels
151+
# // optional filter if we limit the node-set
152+
# // WHERE startNode(r) in nodes AND endNode(r) in nodes
153+
# |
154+
# coalesce(apoc.coll.removeAll(labels(startNode(r)),['__Entity__'])[0],"") +":"+
155+
# startNode(r).id +
156+
# " " + type(r) + " " +
157+
# coalesce(apoc.coll.removeAll(labels(endNode(r)),['__Entity__'])[0],"") +":" +
158+
# endNode(r).id
159+
# ]) as relTexts
160+
161+
# // combine texts into response-text
162+
# WITH d, avg_score,chunkdetails,
163+
# "Text Content:\n" +
164+
# apoc.text.join(texts,"\n----\n") +
165+
# "\n----\nEntities:\n"+
166+
# apoc.text.join(nodeTexts,"\n") +
167+
# "\n----\nRelationships:\n"+
168+
# apoc.text.join(relTexts,"\n")
169+
170+
# as text
171+
# RETURN text, avg_score as score, {length:size(text), source: COALESCE( CASE WHEN d.url CONTAINS "None" THEN d.fileName ELSE d.url END, d.fileName), chunkdetails: chunkdetails} AS metadata
172+
# """
173+
174+
VECTOR_GRAPH_SEARCH_ENTITY_LIMIT = 25
175+
114176
VECTOR_GRAPH_SEARCH_QUERY = """
115177
WITH node as chunk, score
116178
// find the document of the chunk
117179
MATCH (chunk)-[:PART_OF]->(d:Document)
180+
181+
// aggregate chunk-details
182+
WITH d, collect(DISTINCT {{chunk: chunk, score: score}}) AS chunks, avg(score) as avg_score
118183
// fetch entities
119-
CALL { WITH chunk
184+
CALL {{ WITH chunks
185+
UNWIND chunks as chunkScore
186+
WITH chunkScore.chunk as chunk
120187
// entities connected to the chunk
121188
// todo only return entities that are actually in the chunk, remember we connect all extracted entities to all chunks
122-
MATCH (chunk)-[:HAS_ENTITY]->(e)
123-
189+
// todo sort by relevancy (embeddding comparision?) cut off after X (e.g. 25) nodes?
190+
OPTIONAL MATCH (chunk)-[:HAS_ENTITY]->(e)
191+
WITH e, count(*) as numChunks
192+
ORDER BY numChunks DESC LIMIT {no_of_entites}
124193
// depending on match to query embedding either 1 or 2 step expansion
125194
WITH CASE WHEN true // vector.similarity.cosine($embedding, e.embedding ) <= 0.95
126195
THEN
127-
collect { MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,1}(:!Chunk&!Document) RETURN path }
196+
collect {{ OPTIONAL MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){{0,1}}(:!Chunk&!Document) RETURN path }}
128197
ELSE
129-
collect { MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,2}(:!Chunk&!Document) RETURN path }
130-
END as paths
131-
132-
RETURN collect{ unwind paths as p unwind relationships(p) as r return distinct r} as rels,
133-
collect{ unwind paths as p unwind nodes(p) as n return distinct n} as nodes
134-
}
135-
// aggregate chunk-details and de-duplicate nodes and relationships
136-
WITH d, collect(DISTINCT {chunk: chunk, score: score}) AS chunks, avg(score) as avg_score, apoc.coll.toSet(apoc.coll.flatten(collect(rels))) as rels,
137-
138-
// TODO sort by relevancy (embeddding comparision?) cut off after X (e.g. 25) nodes?
139-
apoc.coll.toSet(apoc.coll.flatten(collect(
140-
[r in rels |[startNode(r),endNode(r)]]),true)) as nodes
198+
collect {{ OPTIONAL MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){{0,2}}(:!Chunk&!Document) RETURN path }}
199+
END as paths, e
200+
WITH apoc.coll.toSet(apoc.coll.flatten(collect(distinct paths))) as paths, collect(distinct e) as entities
201+
// de-duplicate nodes and relationships across chunks
202+
RETURN collect{{ unwind paths as p unwind relationships(p) as r return distinct r}} as rels,
203+
collect{{ unwind paths as p unwind nodes(p) as n return distinct n}} as nodes, entities
204+
}}
141205
142206
// generate metadata and text components for chunks, nodes and relationships
143207
WITH d, avg_score,
144208
[c IN chunks | c.chunk.text] AS texts,
145-
[c IN chunks | {id: c.chunk.id, score: c.score}] AS chunkdetails,
209+
[c IN chunks | {{id: c.chunk.id, score: c.score}}] AS chunkdetails,
146210
apoc.coll.sort([n in nodes |
147211
148212
coalesce(apoc.coll.removeAll(labels(n),['__Entity__'])[0],"") +":"+
@@ -154,24 +218,20 @@
154218
coalesce(apoc.coll.removeAll(labels(startNode(r)),['__Entity__'])[0],"") +":"+
155219
startNode(r).id +
156220
" " + type(r) + " " +
157-
coalesce(apoc.coll.removeAll(labels(endNode(r)),['__Entity__'])[0],"") +":" +
158-
endNode(r).id
221+
coalesce(apoc.coll.removeAll(labels(endNode(r)),['__Entity__'])[0],"") +":" + endNode(r).id
159222
]) as relTexts
160-
223+
, entities
161224
// combine texts into response-text
162-
WITH d, avg_score,chunkdetails,
163-
"Text Content:\n" +
164-
apoc.text.join(texts,"\n----\n") +
165-
"\n----\nEntities:\n"+
166-
apoc.text.join(nodeTexts,"\n") +
167-
"\n----\nRelationships:\n"+
168-
apoc.text.join(relTexts,"\n")
169-
170-
as text
171-
RETURN text, avg_score as score, {length:size(text), source: COALESCE( CASE WHEN d.url CONTAINS "None" THEN d.fileName ELSE d.url END, d.fileName), chunkdetails: chunkdetails} AS metadata
172-
"""
173-
174-
175225
226+
WITH d, avg_score,chunkdetails,
227+
"Text Content:\\n" +
228+
apoc.text.join(texts,"\\n----\\n") +
229+
"\\n----\\nEntities:\\n"+
230+
apoc.text.join(nodeTexts,"\\n") +
231+
"\\n----\\nRelationships:\\n" +
232+
apoc.text.join(relTexts,"\\n")
176233
234+
as text,entities
177235
236+
RETURN text, avg_score as score, {{length:size(text), source: COALESCE( CASE WHEN d.url CONTAINS "None" THEN d.fileName ELSE d.url END, d.fileName), chunkdetails: chunkdetails}} AS metadata
237+
"""

docker-compose.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -51,13 +51,14 @@ services:
5151
args:
5252
- BACKEND_API_URL=${BACKEND_API_URL-http://localhost:8000}
5353
- REACT_APP_SOURCES=${REACT_APP_SOURCES-local,youtube,wiki,s3}
54-
- LLM_MODELS=${LLM_MODELS-diffbot,gpt-3.5,gpt-4o}
54+
- LLM_MODELS=${LLM_MODELS-diffbot,openai-gpt-3.5,openai-gpt-4o}
5555
- GOOGLE_CLIENT_ID=${GOOGLE_CLIENT_ID-""}
5656
- BLOOM_URL=${BLOOM_URL-https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true}
5757
- TIME_PER_CHUNK=${TIME_PER_CHUNK-4}
5858
- TIME_PER_PAGE=${TIME_PER_PAGE-50}
5959
- CHUNK_SIZE=${CHUNK_SIZE-5242880}
6060
- ENV=${ENV-DEV}
61+
- CHAT_MODES=${CHAT_MODES-""}
6162
volumes:
6263
- ./frontend:/app
6364
- /app/node_modules

example.env

+2-1
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,10 @@ ENTITY_EMBEDDING=True
2828
BACKEND_API_URL="http://localhost:8000"
2929
BLOOM_URL="https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true"
3030
REACT_APP_SOURCES="local,youtube,wiki,s3,web"
31-
LLM_MODELS="diffbot,gpt-3.5,gpt-4o" # ",ollama_llama3"
31+
LLM_MODELS="diffbot,openai-gpt-3.5,openai-gpt-4o" # ",ollama_llama3"
3232
ENV="DEV"
3333
TIME_PER_CHUNK=4
3434
TIME_PER_PAGE=50
3535
CHUNK_SIZE=5242880
3636
GOOGLE_CLIENT_ID=""
37+
CHAT_MODES=""

frontend/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Neo4j Knowledge Graph Builder
22

3-
Reactjs Responsive app for building an knowledge graph using [Neo4j Needle](https://www.neo4j.design/).
3+
Reactjs app for building an knowledge graph using [Neo4j Needle](https://www.neo4j.design/).
44

55
## Features
66
- 🚀 Responsive: Adapts to different screen sizes for optimal user experience.

frontend/src/assets/images/web-search-darkmode-final.svg

-1
This file was deleted.

0 commit comments

Comments
 (0)