-
Notifications
You must be signed in to change notification settings - Fork 333
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add: deployable adaptive retriever examples & developments
GitOrigin-RevId: a3396333df66778b7a14649ede3fb0a9e4300ebe
- Loading branch information
1 parent
5e20cee
commit 557393a
Showing
7 changed files
with
275 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
## End to end Adaptive RAG with Pathway | ||
|
||
This is the accompanying code for deploying the `adaptive RAG` technique with Pathway. | ||
|
||
To learn more about building & deploying RAG applications with Pathway, including containerization, refer to [demo question answering](../demo-question-answering/README.md). | ||
|
||
## Introduction | ||
This app relies on modules provided under `pathway.xpacks.llm`. | ||
|
||
BaseRAGQuestionAnswerer is the base class to build RAG applications with Pathway vector store and Pathway xpack components. | ||
It is meant to get you started with your RAG application right away. | ||
|
||
Here, we extend the `BaseRAGQuestionAnswerer` to implement the adaptive retrieval and reply to requests in the endpoint `/v1/pw_ai_answer`. | ||
Since we are interested in changing the behavior and logic of the RAG, we only modify `pw_ai_query` function that handles all this logic, and then replies to the post request. | ||
|
||
`pw_ai_query` function takes the `pw_ai_queries` table as the input, this table contains the prompt, and other arguments coming from the post request, see the `BaseRAGQuestionAnswerer` class and defined schemas to learn more about getting inputs with post requests. | ||
We use the data in this table to call our adaptive retrieval logic. | ||
|
||
To do that, we use `answer_with_geometric_rag_strategy_from_index` implementation provided under the `pathway.xpacks.llm.question_answering`. | ||
This function takes an index, LLM, prompt and adaptive parameters such as the starting number of documents. Then, iteratively asks the question to the LLM with an increasing number of context documents retrieved from the index. | ||
We also set `strict_prompt=True`. This adjusts the prompt with additional instructions and adds additional rails to parse the response. | ||
|
||
We encourage you to check the implementation of `answer_with_geometric_rag_strategy_from_index`. | ||
|
||
## Modifying the code | ||
|
||
Under the main function, we define: | ||
- input folders | ||
- LLM | ||
- embedder | ||
- index | ||
- host and port to run the app | ||
- run options (caching, cache folder) | ||
|
||
By default, we used OpenAI `gpt-3.5-turbo`. However, as done in the showcase, it is possible to use any LLM, including locally deployed LLMs. | ||
|
||
If you are interested in building this app in a fully private & local setup, check out the [private RAG example](../private-rag/README.md) that uses `Mistral 7B` as the LLM with a local embedding model. | ||
|
||
You can modify any of the used components by checking the options from: `from pathway.xpacks.llm import embedders, llms, parsers, splitters`. | ||
It is also possible to easily create new components by extending the [`pw.UDF`](https://pathway.com/developers/user-guide/data-transformation/user-defined-functions) class and implementing the `__wrapped__` function. | ||
|
||
To see the setup used in our work, check [the showcase](https://pathway.com/developers/showcases/private-rag-ollama-mistral). | ||
|
||
## Running the app | ||
If you are using the OpenAI modules, create a `.env` file in this directory and put your API key with `OPENAI_API_KEY=sk-...`, or add the `api_key` argument to `OpenAIChat` and `OpenAIEmbedder`. | ||
|
||
Then, simply run with `python app.py` in this directory. | ||
If you are interested in the Docker option, refer to [demo question answering Docker guide](../demo-question-answering/README.md#With-Docker). | ||
|
||
## Using the app | ||
|
||
Finally, query the application with; | ||
|
||
```bash | ||
curl -X 'POST' 'http://0.0.0.0:8000/v1/pw_ai_answer' -H 'accept: */*' -H 'Content-Type: application/json' -d '{ | ||
"prompt": "What is the start date of the contract?" | ||
}' | ||
``` | ||
> `December 21, 2015 [6]` | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
import logging | ||
|
||
import pathway as pw | ||
from dotenv import load_dotenv | ||
from pathway.udfs import DiskCache, ExponentialBackoffRetryStrategy | ||
from pathway.xpacks.llm import embedders, llms, parsers, splitters | ||
from pathway.xpacks.llm.question_answering import AdaptiveRAGQuestionAnswerer | ||
from pathway.xpacks.llm.vector_store import VectorStoreServer | ||
|
||
load_dotenv() | ||
|
||
logging.basicConfig( | ||
level=logging.INFO, | ||
format="%(asctime)s %(name)s %(levelname)s %(message)s", | ||
datefmt="%Y-%m-%d %H:%M:%S", | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
path = "./data" | ||
|
||
my_folder = pw.io.fs.read( | ||
path=path, | ||
format="binary", | ||
with_metadata=True, | ||
) | ||
|
||
sources = [ | ||
my_folder | ||
] # define the inputs (local folders, google drive, sharepoint, ...) | ||
|
||
DEFAULT_GPT_MODEL = "gpt-3.5-turbo" | ||
|
||
chat = llms.OpenAIChat( | ||
model=DEFAULT_GPT_MODEL, | ||
retry_strategy=ExponentialBackoffRetryStrategy(max_retries=6), | ||
cache_strategy=DiskCache(), | ||
temperature=0.0, | ||
) | ||
|
||
app_host = "0.0.0.0" | ||
app_port = 8000 | ||
|
||
parser = parsers.ParseUnstructured() | ||
text_splitter = splitters.TokenCountSplitter(max_tokens=400) | ||
embedder = embedders.OpenAIEmbedder(cache_strategy=DiskCache()) | ||
|
||
vector_server = VectorStoreServer( | ||
*sources, | ||
embedder=embedder, | ||
splitter=text_splitter, | ||
parser=parser, | ||
) | ||
|
||
app = AdaptiveRAGQuestionAnswerer( | ||
llm=chat, | ||
indexer=vector_server, | ||
default_llm_name=DEFAULT_GPT_MODEL, | ||
n_starting_documents=2, | ||
factor=2, | ||
max_iterations=4, | ||
strict_prompt=True, | ||
) | ||
|
||
app.build_server(host=app_host, port=app_port) | ||
|
||
app.run_server(with_cache=True) |
Binary file added
BIN
+261 KB
.../data/IdeanomicsInc_20160330_10-K_EX-10.26_9512211_EX-10.26_Content License Agreement.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
## Fully private RAG with Pathway | ||
|
||
This is the accompanying code for deploying the `adaptive RAG` technique with Pathway. | ||
|
||
To learn more about building & deploying RAG applications with Pathway, including containerization, refer to [demo question answering](../demo-question-answering/README.md). | ||
|
||
## Introduction | ||
This app relies on modules provided under `pathway.xpacks.llm`. | ||
|
||
`BaseRAGQuestionAnswerer` is the base class to build RAG applications with Pathway vector store and Pathway xpack components. | ||
It is meant to get you started with your RAG application right away. | ||
|
||
This example uses the `AdaptiveRAGQuestionAnswerer` that extends the `BaseRAGQuestionAnswerer` with the adaptive retrieval technique. | ||
|
||
Then, replies to requests in the endpoint `/v1/pw_ai_answer`. | ||
|
||
|
||
`pw_ai_query` function takes the `pw_ai_queries` table as the input, this table contains the prompt, and other arguments coming from the post request, see the `BaseRAGQuestionAnswerer` class and defined schemas to learn more about getting inputs with post requests. | ||
We use the data in this table to call our adaptive retrieval logic. | ||
|
||
To do that, we use `answer_with_geometric_rag_strategy_from_index` implementation provided under the `pathway.xpacks.llm.question_answering`. | ||
This function takes an index, LLM, prompt and adaptive parameters such as the starting number of documents. Then, iteratively asks the question to the LLM with an increasing number of context documents retrieved from the index. | ||
We also set `strict_prompt=True`. This adjusts the prompt with additional instructions and adds additional rails to parse the response. | ||
|
||
We encourage you to check the implementation of `answer_with_geometric_rag_strategy_from_index`. | ||
|
||
## Modifying the code | ||
|
||
Under the main function, we define: | ||
- input folders | ||
- LLM | ||
- embedder | ||
- index | ||
- host and port to run the app | ||
- run options (caching, cache folder) | ||
|
||
By default, we used locally deployed `Mistral 7B`. App is LLM agnostic and, it is possible to use any LLM. | ||
You can modify any of the components by checking the options from the imported modules: `from pathway.xpacks.llm import embedders, llms, parsers, splitters`. | ||
|
||
It is also possible to easily create new components by extending the [`pw.UDF`](https://pathway.com/developers/user-guide/data-transformation/user-defined-functions) class and implementing the `__wrapped__` function. | ||
|
||
## Deploying and using a local LLM | ||
Due to its popularity and ease of use, we decided to run the `Mistral 7B` on `Ollama`. | ||
|
||
To run local LLM, refer to these steps: | ||
- Download Ollama from [ollama.com/download](https://ollama.com/download) | ||
- In your terminal, run `ollama serve` | ||
- In another terminal, run `ollama run mistral` | ||
|
||
You can now test it with the following request: | ||
|
||
```bash | ||
curl -X POST http://localhost:11434/api/generate -d '{ | ||
"model": "mistral", | ||
"prompt":"Here is a story about llamas eating grass" | ||
}' | ||
``` | ||
|
||
## Running the app | ||
First, make sure your local LLM is up and running. | ||
|
||
Then, simply run with `python app.py` in this directory. | ||
If you are interested in the Docker option, refer to [demo question answering Docker guide](../demo-question-answering/README.md#With-Docker). | ||
|
||
## Using the app | ||
|
||
Finally, query the application with; | ||
|
||
```bash | ||
curl -X 'POST' 'http://0.0.0.0:8000/v1/pw_ai_answer' -H 'accept: */*' -H 'Content-Type: application/json' -d '{ | ||
"prompt": "What is the start date of the contract?" | ||
}' | ||
``` | ||
> `December 21, 2015 [6]` | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
import logging | ||
|
||
import pathway as pw | ||
from dotenv import load_dotenv | ||
from pathway.xpacks.llm import embedders, llms, parsers, splitters | ||
from pathway.xpacks.llm.question_answering import AdaptiveRAGQuestionAnswerer | ||
from pathway.xpacks.llm.vector_store import VectorStoreServer | ||
|
||
load_dotenv() | ||
|
||
logging.basicConfig( | ||
level=logging.INFO, | ||
format="%(asctime)s %(name)s %(levelname)s %(message)s", | ||
datefmt="%Y-%m-%d %H:%M:%S", | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
path = "./data" | ||
|
||
my_folder = pw.io.fs.read( | ||
path=path, | ||
format="binary", | ||
with_metadata=True, | ||
) | ||
|
||
sources = [ | ||
my_folder | ||
] # define the inputs (local folders, google drive, sharepoint, ...) | ||
|
||
DEFAULT_MODEL = "ollama/mistral" | ||
|
||
chat = llms.LiteLLMChat( | ||
model=DEFAULT_MODEL, | ||
temperature=0, | ||
top_p=1, | ||
api_base="http://localhost:11434", # local deployment | ||
format="json", # only available in Ollama local deploy, not usable in Mistral API | ||
) | ||
|
||
app_host = "0.0.0.0" | ||
app_port = 8000 | ||
|
||
parser = parsers.ParseUnstructured() | ||
text_splitter = splitters.TokenCountSplitter(max_tokens=400) | ||
|
||
embedding_model = "avsolatorio/GIST-small-Embedding-v0" | ||
|
||
embedder = embedders.SentenceTransformerEmbedder( | ||
embedding_model, call_kwargs={"show_progress_bar": False} | ||
) | ||
|
||
vector_server = VectorStoreServer( | ||
*sources, | ||
embedder=embedder, | ||
splitter=text_splitter, | ||
parser=parser, | ||
) | ||
|
||
app = AdaptiveRAGQuestionAnswerer( | ||
llm=chat, | ||
indexer=vector_server, | ||
default_llm_name=DEFAULT_MODEL, | ||
n_starting_documents=2, | ||
factor=2, | ||
max_iterations=4, | ||
strict_prompt=True, | ||
) | ||
|
||
app.build_server(host=app_host, port=app_port) | ||
|
||
app.run_server(with_cache=True) |
Binary file added
BIN
+261 KB
.../data/IdeanomicsInc_20160330_10-K_EX-10.26_9512211_EX-10.26_Content License Agreement.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters