Skip to content

Commit 990a172

Browse files
committed
Refactor and improve
1 parent 2482d53 commit 990a172

File tree

5 files changed

+198
-145
lines changed

5 files changed

+198
-145
lines changed

.env.sample

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ LANGUAGE_MODEL="gpt-3.5-turbo" # gpt-4 gpt-3.5-turbo
55

66
# Deeplake vector DB
77
ACTIVELOOP_TOKEN=""
8-
DATASET_PATH="./local_vector_db" # "hub://USER_ID/custom_dataset" # Edit with your user id if you want to use the cloud db.
8+
DATASET_PATH="./chainstack_docs" # "hub://USER_ID/custom_dataset" # Edit with your user id if you want to use the cloud db.
99

1010
# Scrape settings
1111
SITE_MAP="https://docs.chainstack.com/sitemap.xml"

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
# Virtual env
22
docs-chat/
33

4+
.DS_store
45
# Vector DB
56
local_vector_db/
7+
chainstack_docs/
68

79
# Byte-compiled / optimized / DLL files
810
__pycache__/

README.md

+60-49
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,62 @@
11
# Chainctack docs chat bot
22

3-
MVP for docs specific chat bot.
3+
Simple CLI implemenation of a docs specific AI assistant. This project guides you to build an AI assistant for the Chainstack documentation using LangChain and Activeloop.
4+
5+
Read the article for an in depth guide:
6+
7+
- []()
48

59
## Requirements
610

711
Before getting started, ensure you have the following:
812

9-
* [Python](https://www.python.org/downloads/) - Version 3.7 or newer is required.
10-
* An active account on OpenAI, along with an [OpenAI API key](https://platform.openai.com/account/api-keys).
11-
* A Deep Lake account, complete with a [Deep Lake API key](https://app.activeloop.ai/?utm_source=referral&utm_medium=platform&utm_campaign=signup_promo_settings&utm_id=plg).
13+
- [Python](https://www.python.org/downloads/) - Version 3.7 or newer is required.
14+
- An active account on OpenAI, along with an [OpenAI API key](https://platform.openai.com/account/api-keys).
15+
- An Activeloop account, complete with a [Activeloop API key](https://app.activeloop.ai/?utm_source=referral&utm_medium=platform&utm_campaign=signup_promo_settings&utm_id=plg).
1216

1317
## Project structure
1418

15-
* `docs-chat` directory is a Python virtual environment
16-
* `main.py` scrapes pages and creates vector database.
17-
* `chat.py` accepts users queries.
19+
- `.env` stores secrets and configuration as environment variables.
20+
- `main.py` scrapes pages and creates vector database.
21+
- `chat.py` accepts users queries.
1822

1923
## Quickstart
2024

2125
- Clone the repository
2226

23-
- Create new Python virtual environment
27+
```sh
28+
git clone https://github.com/soos3d/chainstack-docs-chat.git
29+
```
2430

2531
- Get into the project's directory
2632

2733
```sh
2834
cd chainstack-docs-chat
2935
```
3036

37+
- Create new Python virtual environment
38+
39+
```sh
40+
python3 -m venv docs-chat
41+
```
42+
3143
- Install dependencies
3244

3345
```sh
3446
pip install -r requirements.txt
3547
```
3648

37-
- Add API keys to `.env`
49+
- Add API keys and config to the `.env` file:
3850

3951
```env
40-
# OpenAI
52+
# OpenAI
4153
OPENAI_API_KEY=""
4254
EMBEDDINGS_MODEL="text-embedding-ada-002"
4355
LANGUAGE_MODEL="gpt-3.5-turbo" # gpt-4 gpt-3.5-turbo
4456
4557
# Deeplake vector DB
4658
ACTIVELOOP_TOKEN=""
47-
DATASET_PATH="./local_vector_db" # "hub://USER_ID/custom_dataset" # Edit with your user id if you want to use the cloud db.
59+
DATASET_PATH="./chainstack_docs" # "hub://USER_ID/custom_dataset" # Edit with your user id if you want to use the cloud db.
4860
4961
# Scrape settings
5062
SITE_MAP="https://docs.chainstack.com/sitemap.xml"
@@ -69,55 +81,54 @@ Pyrhon3 chat.py
6981
Sample interaction using the GPT3.5-turbo model. Use GPT4 for better responses.
7082

7183
```
72-
Please enter your question (or 'quit' to stop): what are chainstack core pillars?
84+
Please enter your question (or 'quit' to stop): What are Chainstack core pillars?
85+
Chainstack's core pillars are:
86+
87+
1. Unbeatable pricing: Chainstack offers competitive pricing options for its services. You can check their pricing options and use the pricing calculator on their website or contact them for more information.
88+
89+
2. Unbounded performance: Chainstack does not impose rate limiting or hard caps on its services. This means that you can enjoy high-performance and scalability without any restrictions.
90+
91+
3. Unlimited flexibility: Chainstack provides unlimited flexibility to its users. You can customize your node settings, such as the txpool.pricebump, and access additional node resources. They also offer load balancing and other customization options to meet your specific needs.
92+
93+
++source++: https://docs.chainstack.com/docs/platform-introduction
94+
```
95+
96+
```
97+
Please enter your question (or 'quit' to stop): How can I start using the Ethereum API with Chainstack?
98+
To start using the Ethereum API with Chainstack, you can follow these steps:
7399
74-
Question: what are chainstack core pillars?
75-
Answer: Chainstack's core pillars are unbeatable pricing, unbounded performance, and unlimited flexibility.
100+
1. Sign up with Chainstack: Visit the Chainstack website and sign up for an account.
76101
77-
Tokens Used: 1178
78-
Prompt Tokens: 1160
79-
Completion Tokens: 18
80-
Successful Requests: 1
81-
Total Cost (USD): $0.002356
102+
2. Deploy a node: Once you have signed up, deploy an Ethereum node on Chainstack. You can choose the network you want to deploy (such as the Ethereum Sepolia testnet) and configure the node according to your needs.
82103
83-
Please enter your question (or 'quit' to stop): how can i start using the Ethereum API with chainstack?
104+
3. View node access and credentials: After deploying the node, you will be able to view the access details and credentials for your node. This includes the RPC URL, which you will need to connect to the network.
84105
85-
Question: how can i start using the Ethereum API with chainstack?
86-
Answer: To use the Ethereum API with Chainstack, you need to follow these steps:
106+
4. Connect to the network: Use the RPC URL provided by Chainstack to connect to the Ethereum network. You can use libraries like web3.js to interact with the network and perform various operations such as reading the latest block number or sending transactions.
107+
108+
By following these steps, you will be able to start using the Ethereum API with Chainstack and interact with the Ethereum network.
109+
110+
++source++: https://docs.chainstack.com/docs/ethereum-tutorial-trust-fund-account-with-remix
111+
```
112+
113+
```
114+
Please enter your question (or 'quit' to stop): What methods can I use to get ethereum blocks information?
115+
To get Ethereum blocks information, you can use the following methods:
87116
88-
1. Sign up with Chainstack.
89-
2. Deploy an Ethereum RPC node.
90-
3. View your node access and credentials.
91-
4. Create an API key to authenticate your requests to the Chainstack API.
92-
5. Use tools such as curl or Postman to make manual requests to the Ethereum RPC node using JSON-RPC and the command line.
117+
1. eth_blockNumber: This method returns the number of the most recent block on the Ethereum blockchain.
93118
94-
Once you have completed these steps, you can start using the Ethereum API to interact with the Ethereum blockchain and build your applications.
119+
2. eth_getBlockByHash: This method retrieves a block by its hash.
95120
96-
Tokens Used: 1389
97-
Prompt Tokens: 1266
98-
Completion Tokens: 123
99-
Successful Requests: 2
100-
Total Cost (USD): $0.0027780000000000005
121+
3. eth_getBlockByNumber: This method retrieves a block by its number.
101122
102-
Please enter your question (or 'quit' to stop): what methods can i use to get ethereum blocks information?
123+
4. eth_getBlockTransactionCountByHash: This method returns the number of transactions in a block given its hash.
103124
104-
Question: what methods can i use to get ethereum blocks information?
105-
Answer: The following methods can be used to retrieve Ethereum block information when using the Ethereum API with Chainstack:
125+
5. eth_getBlockTransactionCountByNumber: This method returns the number of transactions in a block given its number.
106126
107-
- eth_blockNumber
108-
- eth_getBlockByHash
109-
- eth_getBlockByNumber
110-
- eth_getBlockTransactionCountByHash
111-
- eth_getBlockTransactionCountByNumber
112-
- eth_newBlockFilter
127+
6. eth_newBlockFilter: This method creates a new filter that notifies you when a new block is added to the Ethereum blockchain.
113128
114-
These methods allow developers to access specific block details such as the block's transactions, timestamp, height, header, and more.
129+
These methods allow you to access specific block details such as transactions, timestamp, height, header, and more.
115130
116-
Tokens Used: 1846
117-
Prompt Tokens: 1739
118-
Completion Tokens: 107
119-
Successful Requests: 2
120-
Total Cost (USD): $0.0036920000000000004
131+
++source++: https://docs.chainstack.com/reference/ethereum-blocks-rpc-methods
121132
```
122133

123-
134+
> Note that this is a basic app and many improvements can be made.

chat.py

+57-42
Original file line numberDiff line numberDiff line change
@@ -4,71 +4,86 @@
44
from langchain.chat_models import ChatOpenAI
55
from langchain.chains import ConversationalRetrievalChain
66
from langchain.embeddings import OpenAIEmbeddings
7-
from langchain.callbacks import get_openai_callback
7+
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
88

9-
# Load environment variables from .env file
10-
load_dotenv()
119

12-
# Set environment variables
13-
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
14-
os.environ['ACTIVELOOP_TOKEN'] = os.getenv('ACTIVELOOP_TOKEN')
15-
language_model = os.getenv('LANGUAGE_MODEL')
10+
def load_environment_variables():
11+
"""Load environment variables from .env file."""
12+
load_dotenv()
13+
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
14+
os.environ['ACTIVELOOP_TOKEN'] = os.getenv('ACTIVELOOP_TOKEN')
1615

17-
# Set DeepLake dataset path
18-
DEEPLAKE_PATH = os.getenv('DATASET_PATH')
16+
def initialize_embeddings():
17+
"""Initialize OpenAI embeddings and disallow special tokens."""
18+
return OpenAIEmbeddings(disallowed_special=())
1919

20-
# Initialize OpenAI embeddings and disallow special tokens
21-
EMBEDDINGS = OpenAIEmbeddings(disallowed_special=())
20+
def initialize_deeplake(embeddings):
21+
"""Initialize DeepLake vector store with OpenAI embeddings."""
22+
return DeepLake(
23+
dataset_path=os.getenv('DATASET_PATH'),
24+
read_only=True,
25+
embedding=embeddings,
26+
)
2227

23-
# Initialize DeepLake vector store with OpenAI embeddings
24-
deep_lake = DeepLake(
25-
dataset_path=DEEPLAKE_PATH,
26-
read_only=True,
27-
embedding_function=EMBEDDINGS,
28-
)
28+
def initialize_retriever(deep_lake):
29+
"""Initialize retriever and set search parameters."""
30+
retriever = deep_lake.as_retriever()
31+
retriever.search_kwargs.update({
32+
'distance_metric': 'cos',
33+
'fetch_k': 100,
34+
'maximal_marginal_relevance': True,
35+
'k': 10,
36+
})
37+
return retriever
2938

30-
# Initialize retriever and set search parameters
31-
retriever = deep_lake.as_retriever()
32-
retriever.search_kwargs.update({
33-
'distance_metric': 'cos',
34-
'fetch_k': 100,
35-
'maximal_marginal_relevance': True,
36-
'k': 10,
37-
})
39+
def initialize_chat_model():
40+
"""Initialize ChatOpenAI model."""
41+
return ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], model_name=os.getenv('LANGUAGE_MODEL'), temperature=0.0)
3842

39-
# Initialize ChatOpenAI model
40-
model = ChatOpenAI(model_name=language_model, temperature=0.2) # gpt-3.5-turbo by default. Use gpt-4 for better and more accurate responses
41-
42-
# Initialize ConversationalRetrievalChain
43-
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
44-
45-
# Initialize chat history
46-
chat_history = []
43+
def initialize_conversational_chain(model, retriever):
44+
"""Initialize ConversationalRetrievalChain."""
45+
return ConversationalRetrievalChain.from_llm(model, retriever=retriever, return_source_documents=True)
4746

4847
def get_user_input():
4948
"""Get user input and handle 'quit' command."""
5049
question = input("\nPlease enter your question (or 'quit' to stop): ")
51-
if question.lower() == 'quit':
52-
return None
53-
return question
50+
return None if question.lower() == 'quit' else question
5451

52+
# In case you want to format the result.
5553
def print_answer(question, answer):
5654
"""Format and print question and answer."""
5755
print(f"\nQuestion: {question}\nAnswer: {answer}\n")
5856

5957
def main():
6058
"""Main program loop."""
59+
load_environment_variables()
60+
embeddings = initialize_embeddings()
61+
deep_lake = initialize_deeplake(embeddings)
62+
retriever = initialize_retriever(deep_lake)
63+
model = initialize_chat_model()
64+
qa = initialize_conversational_chain(model, retriever)
65+
66+
# In this case the chat history is stored in memory only
67+
chat_history = []
68+
6169
while True:
6270
question = get_user_input()
6371
if question is None: # User has quit
6472
break
73+
74+
# Get results based on question
75+
result = qa({"question": question, "chat_history": chat_history})
76+
chat_history.append((question, result['answer']))
77+
78+
# Take the first source to display
79+
first_document = result['source_documents'][0]
80+
metadata = first_document.metadata
81+
source = metadata['source']
6582

66-
# Display token usage and approximate costs
67-
with get_openai_callback() as tokens_usage:
68-
result = qa({"question": question, "chat_history": chat_history})
69-
chat_history.append((question, result['answer']))
70-
print_answer(question, result['answer'])
71-
print(tokens_usage)
83+
# We are streaming the response so no need to print those
84+
#print(f"-> **Question**: {question}\n")
85+
#print(f"**Answer**: {result['answer']}\n")
86+
print(f"\n\n++source++: {source}")
7287

7388
if __name__ == "__main__":
7489
main()

0 commit comments

Comments
 (0)