CodeCutTech
diff --git a/‎data_engineer/duckdb.ipynb
Lines changed: 1207 additions & 0 deletions b/‎data_engineer/duckdb.ipynb
Lines changed: 1207 additions & 0 deletions
diff --git a/‎data_engineering/from_pandas_to_production_delta_rs.ipynb
Lines changed: 579 additions & 0 deletions b/‎data_engineering/from_pandas_to_production_delta_rs.ipynb
Lines changed: 579 additions & 0 deletions
diff --git a/‎llm/Pinecone_Ollama.ipynb
Lines changed: 508 additions & 0 deletions b/‎llm/Pinecone_Ollama.ipynb
Lines changed: 508 additions & 0 deletions
diff --git a/‎llm/from_messy_pdfs_to_rag_ready_data_complete_document_processing_with_docling.ipynb
Lines changed: 804 additions & 0 deletions b/‎llm/from_messy_pdfs_to_rag_ready_data_complete_document_processing_with_docling.ipynb
Lines changed: 804 additions & 0 deletions
diff --git a/‎llm/langraph.ipynb
Lines changed: 161 additions & 0 deletions b/‎llm/langraph.ipynb
Lines changed: 161 additions & 0 deletions
diff --git a/‎machine_learning/pyspark_langchain.ipynb
Lines changed: 113 additions & 0 deletions b/‎machine_learning/pyspark_langchain.ipynb
Lines changed: 113 additions & 0 deletions
@@ -0,0 +1,161 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Building Coordinated AI Agents with LangGraph\n",
+    "\n",
+    "LangGraph provides a framework for building coordinated multi-agent systems with specialized agents working together through structured communication.\n",
+    "\n",
+    "## Getting Started with LangGraph\n",
+    "\n",
+    "### Environment Setup\n",
+    "\n",
+    "First, you should setup your environment with the following packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install required packages\n",
+    "# pip install langgraph langgraph-supervisor langchain langchain-core langchain-tavily langchain-openai python-dotenv\n",
+    "\n",
+    "from dotenv import load_dotenv\n",
+    "from langchain_openai import ChatOpenAI\n",
+    "from langchain_tavily import TavilySearch\n",
+    "from langgraph.prebuilt import create_react_agent\n",
+    "\n",
+    "load_dotenv()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Creating Agents in LangGraph\n",
+    "\n",
+    "LangGraph makes it very easy to create your first agent. Let's walk through how to create a general purpose assistant with web search functionality in a few lines of code."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Initialize the search tool\n",
+    "web_search = TavilySearch(max_results=3)\n",
+    "\n",
+    "# Create the agent using create_react_agent()\n",
+    "agent = create_react_agent(\n",
+    "    model=ChatOpenAI(model=\"gpt-4o\"),\n",
+    "    tools=[web_search],\n",
+    "    prompt=\"You are a helpful assistant that can search the web for information and summarize the results to enhance your output.\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Invoke the agent with a question\n",
+    "response = agent.invoke(\n",
+    "    {\n",
+    "        \"messages\": [\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Find the open and close prices of Apple's stocks for June 1, 2025.\"\n",
+    "            }\n",
+    "        ]\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "print(response[\"messages\"][-1].content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Creating a Supervisor Multi-Agent System\n",
+    "\n",
+    "A supervisor is a special agent that manages the workflow between multiple agents. It is responsible for:\n",
+    "\n",
+    "- Routing the workflow between agents\n",
+    "- Managing the conversation history\n",
+    "- Ensuring the agents are working together to achieve the goal"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph_supervisor import create_supervisor\n",
+    "\n",
+    "# Define domain-specific agents with their own tools and prompts\n",
+    "agent1 = create_react_agent(\n",
+    "    model=ChatOpenAI(model=\"gpt-4o\"),\n",
+    "    tools=[web_search],\n",
+    "    prompt=\"You are a bull agent - find positive investment indicators\",\n",
+    ")\n",
+    "agent2 = create_react_agent(\n",
+    "    model=ChatOpenAI(model=\"gpt-4o\"), \n",
+    "    tools=[web_search],\n",
+    "    prompt=\"You are a bear agent - find negative investment indicators\",\n",
+    ")\n",
+    "agent3 = create_react_agent(\n",
+    "    model=ChatOpenAI(model=\"gpt-4o\"),\n",
+    "    tools=[web_search], \n",
+    "    prompt=\"You are a chairman - make final investment decisions\",\n",
+    ")\n",
+    "\n",
+    "# Create a memory checkpointer to persist conversation history for the supervisor\n",
+    "memory = MemorySaver()\n",
+    "\n",
+    "supervisor = create_supervisor(\n",
+    "    model=ChatOpenAI(model=\"gpt-4o\"),\n",
+    "    agents=[agent1, agent2, agent3],\n",
+    "    prompt=\"Detailed system prompt instructing the model how to route the workflow.\",\n",
+    ").compile(checkpointer=memory)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Call the supervisor with a user query\n",
+    "response = supervisor.invoke({\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Should I invest in NVDA stock? I want to hear both bullish and bearish arguments.\"\n",
+    "        }\n",
+    "    ]\n",
+    "})\n",
+    "\n",
+    "print(response[\"messages\"][-1].content)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3",
+   "path": "/Users/khuyentran/codecut_content/codecut_articles/.venv/share/jupyter/kernels/python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
@@ -0,0 +1,113 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Have you ever wanted to ask a question about your PySpark data in plain English instead of writing SQL?\n",
+    "[LangChain's Spark SQL Toolkit](https://python.langchain.com/docs/integrations/tools/spark_sql/) enables natural language data querying by:\n",
+    "- Translating your requests into SQL\n",
+    "- Executing them against your Spark cluster\n",
+    "- Returning the results in a readable format\n",
+    "This makes it much easier to work with large-scale data while still leveraging Spark's powerful distributed computing capabilities.\n",
+    "To demonstrate, we'll create a simple DataFrame and use LangChain's Spark SQL tool to query it.\n",
+    "```python\n",
+    "from pyspark.sql import SparkSession, Row\n",
+    "# Create sample data and DataFrame\n",
+    "data = [Row(name=\"Alice\", age=30), Row(name=\"Bob\", age=25)]\n",
+    "spark = SparkSession.builder.getOrCreate()\n",
+    "df = spark.createDataFrame(data)\n",
+    "df.write.saveAsTable(\"people\")\n",
+    "df.show()\n",
+    "```\n",
+    "This creates a table `people` accessible via SQL.\n",
+    "Next, we'll set up the key components that enable natural language querying of our Spark data. Here are the steps:\n",
+    "1. Initialize the Spark SQL tool which provides the interface to our Spark database.\n",
+    "2. Initialize a language model.\n",
+    "3. Initialize the Spark SQL toolkit, which connects the language model with the Spark database.\n",
+    "4. Create an agent executor that combines a language model with the Spark SQL toolkit.\n",
+    "```python\n",
+    "# Initialize Spark SQL tool\n",
+    "spark_sql = SparkSQL(schema=\"default\")\n",
+    "# Initialize LLM\n",
+    "llm = ChatOpenAI(temperature=0)\n",
+    "# Initialize toolkit\n",
+    "toolkit = SparkSQLToolkit(db=spark_sql, llm=llm)\n",
+    "# Create agent executor\n",
+    "agent_executor = create_spark_sql_agent(llm=llm, toolkit=toolkit, verbose=True)\n",
+    "```\n",
+    "For a hands-on guide on how to build coordinated AI agents with LangGraph, check out [Building Coordinated AI Agents with LangGraph: A Hands-On Tutorial](https://codecut.ai/building-multi-agent-ai-langgraph-tutorial/).\n",
+    "Now we can ask the agent to query the data.\n",
+    "```python\n",
+    "agent_executor.run(\"What is the average age of people in the table?\")\n",
+    "```\n",
+    "```\n",
+    "> Entering new AgentExecutor chain...\n",
+    "Action: list_tables_sql_db\n",
+    "Action Input:\n",
+    "Observation: people\n",
+    "Thought:I can query the \"people\" table for the average age.\n",
+    "Action: query_sql_db\n",
+    "Action Input: SELECT AVG(age) FROM people\n",
+    "Observation: [('27.5',)]\n",
+    "Thought:The average age of people in the table is 27.5.\n",
+    "Final Answer: 27.5\n",
+    "> Finished chain.\n",
+    "```\n",
+    "The answer for the average age is correct.\n",
+    "The output shows that the agent:\n",
+    "- Looked up the available tables\n",
+    "- Queried the `people` table for the average age\n",
+    "- Got the result\n",
+    "- Answered the question with the result\n",
+    "Let's try another question.\n",
+    "```python\n",
+    "agent_executor.run(\"Who is the oldest person in the table?\")\n",
+    "```\n",
+    "```\n",
+    "> Entering new AgentExecutor chain...\n",
+    "Action: list_tables_sql_db\n",
+    "Action Input:\n",
+    "Observation: people\n",
+    "Thought:I should query the \"people\" table to find the oldest person.\n",
+    "Action: schema_sql_db\n",
+    "Action Input: people\n",
+    "Observation: CREATE TABLE spark_catalog.default.people (\n",
+    "  name STRING,\n",
+    "  age BIGINT)\n",
+    ";\n",
+    "/*\n",
+    "3 rows from people table:\n",
+    "name    age\n",
+    "Alice   30\n",
+    "Bob     25\n",
+    "*/\n",
+    "Thought:I should write a query to select the oldest person from the \"people\" table.\n",
+    "Action: query_sql_db\n",
+    "Action Input: SELECT name, age FROM people ORDER BY age DESC LIMIT 1\n",
+    "Observation: [('Alice', '30')]\n",
+    "Thought:I now know the final answer\n",
+    "Final Answer: Alice\n",
+    "> Finished chain.\n",
+    "```\n",
+    "The answer for the oldest person is also correct.\n",
+    "## Related Resources\n",
+    "For deeper exploration of LangChain and Spark integrations:\n",
+    "- **Multi-Agent Systems**: [Building Coordinated AI Agents with LangGraph: A Hands-On Tutorial](https://codecut.ai/building-multi-agent-ai-langgraph-tutorial/) for advanced AI agent coordination with LangChain\n",
+    "- **Data Science Workflows**: [Build Data Science Workflows with DeepSeek and LangChain](https://codecut.ai/build-data-science-workflows-deepseek-langchain/) for comprehensive data analysis pipelines\n",
+    "- **Private AI Solutions**: [Private AI Workflows with LangChain and Ollama](https://codecut.ai/private-ai-workflows-langchain-ollama/) for local model deployment and privacy-focused implementations\n",
+    "- **Official Documentation**: [LangChain Spark SQL Tool](https://python.langchain.com/docs/integrations/tools/spark_sql/) for complete API reference and advanced configuration options"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3",
+   "path": "/Users/khuyentran/codecut_content/codecut_articles/.venv/share/jupyter/kernels/python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}