Skip to content

Commit 446cd85

Browse files
feat: add new notebook collections
1 parent 20c088f commit 446cd85

9 files changed

+4767
-0
lines changed

data_engineer/duckdb.ipynb

Lines changed: 1207 additions & 0 deletions
Large diffs are not rendered by default.

data_engineering/from_pandas_to_production_delta_rs.ipynb

Lines changed: 579 additions & 0 deletions
Large diffs are not rendered by default.

llm/Pinecone_Ollama.ipynb

Lines changed: 508 additions & 0 deletions
Large diffs are not rendered by default.

llm/from_messy_pdfs_to_rag_ready_data_complete_document_processing_with_docling.ipynb

Lines changed: 804 additions & 0 deletions
Large diffs are not rendered by default.

llm/langraph.ipynb

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Building Coordinated AI Agents with LangGraph\n",
8+
"\n",
9+
"LangGraph provides a framework for building coordinated multi-agent systems with specialized agents working together through structured communication.\n",
10+
"\n",
11+
"## Getting Started with LangGraph\n",
12+
"\n",
13+
"### Environment Setup\n",
14+
"\n",
15+
"First, you should setup your environment with the following packages:"
16+
]
17+
},
18+
{
19+
"cell_type": "code",
20+
"execution_count": null,
21+
"metadata": {},
22+
"outputs": [],
23+
"source": [
24+
"# Install required packages\n",
25+
"# pip install langgraph langgraph-supervisor langchain langchain-core langchain-tavily langchain-openai python-dotenv\n",
26+
"\n",
27+
"from dotenv import load_dotenv\n",
28+
"from langchain_openai import ChatOpenAI\n",
29+
"from langchain_tavily import TavilySearch\n",
30+
"from langgraph.prebuilt import create_react_agent\n",
31+
"\n",
32+
"load_dotenv()"
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"metadata": {},
38+
"source": [
39+
"### Creating Agents in LangGraph\n",
40+
"\n",
41+
"LangGraph makes it very easy to create your first agent. Let's walk through how to create a general purpose assistant with web search functionality in a few lines of code."
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"# Initialize the search tool\n",
51+
"web_search = TavilySearch(max_results=3)\n",
52+
"\n",
53+
"# Create the agent using create_react_agent()\n",
54+
"agent = create_react_agent(\n",
55+
" model=ChatOpenAI(model=\"gpt-4o\"),\n",
56+
" tools=[web_search],\n",
57+
" prompt=\"You are a helpful assistant that can search the web for information and summarize the results to enhance your output.\",\n",
58+
")"
59+
]
60+
},
61+
{
62+
"cell_type": "code",
63+
"execution_count": null,
64+
"metadata": {},
65+
"outputs": [],
66+
"source": [
67+
"# Invoke the agent with a question\n",
68+
"response = agent.invoke(\n",
69+
" {\n",
70+
" \"messages\": [\n",
71+
" {\n",
72+
" \"role\": \"user\",\n",
73+
" \"content\": \"Find the open and close prices of Apple's stocks for June 1, 2025.\"\n",
74+
" }\n",
75+
" ]\n",
76+
" }\n",
77+
")\n",
78+
"\n",
79+
"print(response[\"messages\"][-1].content)"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"metadata": {},
85+
"source": [
86+
"### Creating a Supervisor Multi-Agent System\n",
87+
"\n",
88+
"A supervisor is a special agent that manages the workflow between multiple agents. It is responsible for:\n",
89+
"\n",
90+
"- Routing the workflow between agents\n",
91+
"- Managing the conversation history\n",
92+
"- Ensuring the agents are working together to achieve the goal"
93+
]
94+
},
95+
{
96+
"cell_type": "code",
97+
"execution_count": null,
98+
"metadata": {},
99+
"outputs": [],
100+
"source": [
101+
"from langgraph.checkpoint.memory import MemorySaver\n",
102+
"from langgraph_supervisor import create_supervisor\n",
103+
"\n",
104+
"# Define domain-specific agents with their own tools and prompts\n",
105+
"agent1 = create_react_agent(\n",
106+
" model=ChatOpenAI(model=\"gpt-4o\"),\n",
107+
" tools=[web_search],\n",
108+
" prompt=\"You are a bull agent - find positive investment indicators\",\n",
109+
")\n",
110+
"agent2 = create_react_agent(\n",
111+
" model=ChatOpenAI(model=\"gpt-4o\"), \n",
112+
" tools=[web_search],\n",
113+
" prompt=\"You are a bear agent - find negative investment indicators\",\n",
114+
")\n",
115+
"agent3 = create_react_agent(\n",
116+
" model=ChatOpenAI(model=\"gpt-4o\"),\n",
117+
" tools=[web_search], \n",
118+
" prompt=\"You are a chairman - make final investment decisions\",\n",
119+
")\n",
120+
"\n",
121+
"# Create a memory checkpointer to persist conversation history for the supervisor\n",
122+
"memory = MemorySaver()\n",
123+
"\n",
124+
"supervisor = create_supervisor(\n",
125+
" model=ChatOpenAI(model=\"gpt-4o\"),\n",
126+
" agents=[agent1, agent2, agent3],\n",
127+
" prompt=\"Detailed system prompt instructing the model how to route the workflow.\",\n",
128+
").compile(checkpointer=memory)"
129+
]
130+
},
131+
{
132+
"cell_type": "code",
133+
"execution_count": null,
134+
"metadata": {},
135+
"outputs": [],
136+
"source": [
137+
"# Call the supervisor with a user query\n",
138+
"response = supervisor.invoke({\n",
139+
" \"messages\": [\n",
140+
" {\n",
141+
" \"role\": \"user\",\n",
142+
" \"content\": \"Should I invest in NVDA stock? I want to hear both bullish and bearish arguments.\"\n",
143+
" }\n",
144+
" ]\n",
145+
"})\n",
146+
"\n",
147+
"print(response[\"messages\"][-1].content)"
148+
]
149+
}
150+
],
151+
"metadata": {
152+
"kernelspec": {
153+
"display_name": "Python 3 (ipykernel)",
154+
"language": "python",
155+
"name": "python3",
156+
"path": "/Users/khuyentran/codecut_content/codecut_articles/.venv/share/jupyter/kernels/python3"
157+
}
158+
},
159+
"nbformat": 4,
160+
"nbformat_minor": 4
161+
}
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"Have you ever wanted to ask a question about your PySpark data in plain English instead of writing SQL?\n",
8+
"[LangChain's Spark SQL Toolkit](https://python.langchain.com/docs/integrations/tools/spark_sql/) enables natural language data querying by:\n",
9+
"- Translating your requests into SQL\n",
10+
"- Executing them against your Spark cluster\n",
11+
"- Returning the results in a readable format\n",
12+
"This makes it much easier to work with large-scale data while still leveraging Spark's powerful distributed computing capabilities.\n",
13+
"To demonstrate, we'll create a simple DataFrame and use LangChain's Spark SQL tool to query it.\n",
14+
"```python\n",
15+
"from pyspark.sql import SparkSession, Row\n",
16+
"# Create sample data and DataFrame\n",
17+
"data = [Row(name=\"Alice\", age=30), Row(name=\"Bob\", age=25)]\n",
18+
"spark = SparkSession.builder.getOrCreate()\n",
19+
"df = spark.createDataFrame(data)\n",
20+
"df.write.saveAsTable(\"people\")\n",
21+
"df.show()\n",
22+
"```\n",
23+
"This creates a table `people` accessible via SQL.\n",
24+
"Next, we'll set up the key components that enable natural language querying of our Spark data. Here are the steps:\n",
25+
"1. Initialize the Spark SQL tool which provides the interface to our Spark database.\n",
26+
"2. Initialize a language model.\n",
27+
"3. Initialize the Spark SQL toolkit, which connects the language model with the Spark database.\n",
28+
"4. Create an agent executor that combines a language model with the Spark SQL toolkit.\n",
29+
"```python\n",
30+
"# Initialize Spark SQL tool\n",
31+
"spark_sql = SparkSQL(schema=\"default\")\n",
32+
"# Initialize LLM\n",
33+
"llm = ChatOpenAI(temperature=0)\n",
34+
"# Initialize toolkit\n",
35+
"toolkit = SparkSQLToolkit(db=spark_sql, llm=llm)\n",
36+
"# Create agent executor\n",
37+
"agent_executor = create_spark_sql_agent(llm=llm, toolkit=toolkit, verbose=True)\n",
38+
"```\n",
39+
"For a hands-on guide on how to build coordinated AI agents with LangGraph, check out [Building Coordinated AI Agents with LangGraph: A Hands-On Tutorial](https://codecut.ai/building-multi-agent-ai-langgraph-tutorial/).\n",
40+
"Now we can ask the agent to query the data.\n",
41+
"```python\n",
42+
"agent_executor.run(\"What is the average age of people in the table?\")\n",
43+
"```\n",
44+
"```\n",
45+
"> Entering new AgentExecutor chain...\n",
46+
"Action: list_tables_sql_db\n",
47+
"Action Input:\n",
48+
"Observation: people\n",
49+
"Thought:I can query the \"people\" table for the average age.\n",
50+
"Action: query_sql_db\n",
51+
"Action Input: SELECT AVG(age) FROM people\n",
52+
"Observation: [('27.5',)]\n",
53+
"Thought:The average age of people in the table is 27.5.\n",
54+
"Final Answer: 27.5\n",
55+
"> Finished chain.\n",
56+
"```\n",
57+
"The answer for the average age is correct.\n",
58+
"The output shows that the agent:\n",
59+
"- Looked up the available tables\n",
60+
"- Queried the `people` table for the average age\n",
61+
"- Got the result\n",
62+
"- Answered the question with the result\n",
63+
"Let's try another question.\n",
64+
"```python\n",
65+
"agent_executor.run(\"Who is the oldest person in the table?\")\n",
66+
"```\n",
67+
"```\n",
68+
"> Entering new AgentExecutor chain...\n",
69+
"Action: list_tables_sql_db\n",
70+
"Action Input:\n",
71+
"Observation: people\n",
72+
"Thought:I should query the \"people\" table to find the oldest person.\n",
73+
"Action: schema_sql_db\n",
74+
"Action Input: people\n",
75+
"Observation: CREATE TABLE spark_catalog.default.people (\n",
76+
" name STRING,\n",
77+
" age BIGINT)\n",
78+
";\n",
79+
"/*\n",
80+
"3 rows from people table:\n",
81+
"name age\n",
82+
"Alice 30\n",
83+
"Bob 25\n",
84+
"*/\n",
85+
"Thought:I should write a query to select the oldest person from the \"people\" table.\n",
86+
"Action: query_sql_db\n",
87+
"Action Input: SELECT name, age FROM people ORDER BY age DESC LIMIT 1\n",
88+
"Observation: [('Alice', '30')]\n",
89+
"Thought:I now know the final answer\n",
90+
"Final Answer: Alice\n",
91+
"> Finished chain.\n",
92+
"```\n",
93+
"The answer for the oldest person is also correct.\n",
94+
"## Related Resources\n",
95+
"For deeper exploration of LangChain and Spark integrations:\n",
96+
"- **Multi-Agent Systems**: [Building Coordinated AI Agents with LangGraph: A Hands-On Tutorial](https://codecut.ai/building-multi-agent-ai-langgraph-tutorial/) for advanced AI agent coordination with LangChain\n",
97+
"- **Data Science Workflows**: [Build Data Science Workflows with DeepSeek and LangChain](https://codecut.ai/build-data-science-workflows-deepseek-langchain/) for comprehensive data analysis pipelines\n",
98+
"- **Private AI Solutions**: [Private AI Workflows with LangChain and Ollama](https://codecut.ai/private-ai-workflows-langchain-ollama/) for local model deployment and privacy-focused implementations\n",
99+
"- **Official Documentation**: [LangChain Spark SQL Tool](https://python.langchain.com/docs/integrations/tools/spark_sql/) for complete API reference and advanced configuration options"
100+
]
101+
}
102+
],
103+
"metadata": {
104+
"kernelspec": {
105+
"display_name": "Python 3 (ipykernel)",
106+
"language": "python",
107+
"name": "python3",
108+
"path": "/Users/khuyentran/codecut_content/codecut_articles/.venv/share/jupyter/kernels/python3"
109+
}
110+
},
111+
"nbformat": 4,
112+
"nbformat_minor": 4
113+
}

0 commit comments

Comments
 (0)