Deep Research Agent is an end-to-end, production-style research assistant built entirely with LangGraph. It converts open-ended questions into scoped research briefs, orchestrates parallel sub-agents to gather evidence, and delivers publication-quality reports complete with citations.
- Scoping Before Search – Structured clarification step that decides when to question the user and produces an actionable brief.
- Supervisor + Specialists – A LangGraph supervisor coordinates dedicated researcher agents that can run in parallel with independent context windows.
- Tool-Driven Evidence Gathering – Research agents rely on Tavily web search, automatic summarisation, and reflective thinking loops (
think_tool) to avoid hallucination. - Deterministic Outputs – Pydantic schemas enforce consistent state transitions; reports are generated from cleaned research notes and reference every source.
- Multiple Entry Points – Run from the CLI or Streamlit UI. Rich terminal output keeps the CLI readable.
User Query
│
▼
Scoping Graph (`clarify_with_user` → `write_research_brief`)
│ produces research brief + supervisor kickoff message
▼
Supervisor Graph (`supervisor` ⇄ `supervisor_tools`)
│ delegates ConductResearch tasks, records notes
▼
Research Agents (`llm_call` ⇄ `tool_node` → `compress_research`)
│ each agent performs Tavily searches + summarises raw notes
▼
Final Report Generation (`final_report_generation`)
│ synthesises brief + aggregated notes into markdown report
▼
Formatted Output + Sources
deep_research_from_scratch/
├─ research_agent_full.py # Top-level graph builder
├─ research_agent_scope.py # Clarification + brief generation graph
├─ multi_agent_supervisor.py # Supervisor that launches researcher agents
├─ research_agent.py # Worker agent (search + summarise loop)
├─ utils.py # Tavily integration, summarisation helpers, tools
├─ prompts.py # Prompt templates for every stage
├─ state_*.py # Pydantic schemas + LangGraph state definitions
run_research.py # CLI entry point
streamlit_app.py # Streamlit UI with session persistence
full_agent.ipynb / research_agent.ipynb / research_supervisor.ipynb / scoping.ipynb
example_query.txt # Sample deep-research prompt
requirements.txt # Python dependencies
- Python 3.11 or newer is recommended (LangGraph currently targets modern CPython releases).
- API access for:
- OpenAI (GPT-4.1 / GPT-4.1-mini are the default chat models)
- Tavily Search
- LangSmith (optional, for tracing)
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS / Linux
pip install -U pip
pip install -r requirements.txtAll runtime components read API keys from the environment. You can either export them in your shell or create a keys.py module that sets os.environ[...] before the agents initialise.
# keys.py (example stub – do not commit real keys)
import os
os.environ["OPENAI_API_KEY"] = "sk-...."
os.environ["TAVILY_API_KEY"] = "tvly-...."
os.environ["LANGSMITH_API_KEY"] = "lsv2_...." # optionalpython run_research.py "Compare Gemini to OpenAI Deep Research agents."
python run_research.py "What are the latest developments in AI?" research_1
python run_research.py "Analyse the impact of LLMs on software development" research_2 75Arguments:
query(required) – research question.thread_id(optional, default"1") – persisted LangGraph thread identifier.recursion_limit(optional, default50) – hard cap on graph iterations.
The CLI prints workflow progress, conversation history, and the final markdown report (rendered with Rich when available).
streamlit run streamlit_app.pyFeatures:
- Chat-style input with live clarification prompts.
- Session persistence via LangGraph checkpoints.
- Downloadable markdown report once research completes.
- Model selection – all models are initialised via
langchain.chat_models.init_chat_model. Swap providers or models by changing constructor parameters or by reading from environment variables. - Prompts – tweak high-level behaviour in
deep_research_from_scratch/prompts.py. The prompts already enforce explicit citation handling and structured outputs. - Tooling – add new LangChain tools in
deep_research_from_scratch/utils.pyand register them withtoolsinresearch_agent.py. Remember to updatetools_by_name. - Supervisor policy – adjust maximum iterations or parallelism in
multi_agent_supervisor.py(max_researcher_iterations,max_concurrent_researchers). - Output formatting –
final_report_generation_promptgoverns the final markdown structure. Modify it to emit JSON, tables, or other formats.
Happy researching!