The Clinical ChatBot is a full-stack application that combines modern web technologies with AI/ML capabilities to provide evidence-based clinical information through a conversational interface.
┌─────────────────────────────────────────────────────────────┐
│ Frontend │
│ (Next.js + MUI) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ ChatContainer│ │ DocumentUpload│ │ State Mgmt │ │
│ │ │ │ │ │ (Zustand) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
REST API (Axios)
│
┌─────────────────────────────────────────────────────────────┐
│ Backend │
│ (FastAPI + Python) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Chat Routes │ │Document Routes│ │Health Routes │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Service Layer │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ RAG Engine │ │ Document │ │ Pinecone │ │ │
│ │ │ │ │ Processor │ │ Service │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ │
┌──────┴──────┐ ┌──────┴──────┐
│ OpenAI │ │ Pinecone │
│ GPT-4 │ │ Vector DB │
└─────────────┘ └─────────────┘
- index.tsx: Main entry point, renders ChatContainer
- _app.tsx: Global app wrapper with theme provider
- _document.tsx: Custom HTML document structure
-
ChatContainer.tsx:
- Main orchestration component
- Manages message display and user interactions
- Handles conversation state
-
ChatMessage.tsx:
- Renders individual messages
- Supports markdown formatting
- Role-based styling (user vs assistant)
-
ChatInput.tsx:
- Message input field
- Send button with loading state
- Keyboard shortcuts (Enter to send)
-
DocumentUpload.tsx:
- File upload interface
- Progress tracking
- Success/error feedback
- api.ts:
- Centralized API client
- Axios configuration
- Error handling
- Request/response interceptors
- chatStore.ts:
- Zustand state management
- Message history
- Loading states
- Error handling
- Actions (sendMessage, clearConversation, etc.)
- theme.ts:
- Material-UI theme configuration
- Color palette (clinical/professional)
- Typography settings
- Component customization
chat.py:
POST /api/chat/message: Send message and get responseGET /api/chat/history/{conversation_id}: Retrieve conversation historyDELETE /api/chat/conversation/{conversation_id}: Clear conversationPOST /api/chat/conversations/clear-all: Clear all conversations
documents.py:
POST /api/documents/upload: Upload PDF documentPOST /api/documents/upload-text: Upload text contentGET /api/documents/stats: Get index statisticsDELETE /api/documents/namespace/{namespace}: Delete namespace
health.py:
GET /api/health: System health checkGET /api/health/ping: Simple connectivity check
rag_engine.py:
- Core RAG implementation
- Document retrieval from vector store
- Context formatting
- LLM response generation
- Conversation memory management
- Supports both RAG and non-RAG modes
document_processor.py:
- PDF loading and parsing
- Text chunking with overlap
- Metadata enrichment
- Document indexing pipeline
- Text content processing
pinecone_service.py:
- Pinecone client initialization
- Vector store operations
- Similarity search
- Document addition/deletion
- Index management
- Health checking
schemas.py:
- Pydantic models for data validation
- Request/response schemas
- Type safety
- API documentation
- Environment variable management
- Settings validation
- Default values
- CORS configuration
- User Input: User types message in ChatInput component
- State Update: Zustand store adds message to local state
- API Request: API service sends POST request to
/api/chat/message - Backend Processing:
- Route handler receives request
- RAG Engine retrieves relevant documents from Pinecone
- Documents are formatted as context
- LLM generates response with context
- Conversation memory is updated
- Response: Backend returns response with sources
- UI Update: Zustand store updates with assistant message
- Render: ChatContainer displays new message
- File Selection: User selects PDF file
- Upload Request: API service sends multipart/form-data to
/api/documents/upload - Backend Processing:
- Save temporary file
- Load PDF with PyPDFLoader
- Split into chunks with RecursiveCharacterTextSplitter
- Enrich with metadata
- Generate embeddings with OpenAI
- Store in Pinecone vector database
- Response: Return document ID and stats
- UI Update: Show success message and chunk count
- Query Embedding: User query is embedded using OpenAI embeddings (1536 dimensions)
- Similarity Search: Pinecone performs cosine similarity search
- Top-K Retrieval: Returns top 5 most relevant document chunks
- Score Filtering: Results include relevance scores
- Document Formatting: Retrieved chunks are formatted with metadata
- Context Assembly: Multiple documents are combined into context string
- Source Tracking: Maintains references to original documents
- Prompt Construction: System prompt + context + conversation history + user query
- LLM Invocation: GPT-4 generates response using context
- Memory Update: Conversation history is updated for continuity
Vector Entry:
{
"id": "chunk_<uuid>",
"values": [1536-dimensional embedding],
"metadata": {
"document_id": "doc_<uuid>",
"filename": "diabetes_guidelines.pdf",
"document_type": "clinical_guideline",
"page": 5,
"chunk_id": "chunk_42",
"chunk_index": 42,
"total_chunks": 100,
"indexed_at": "2025-01-15T10:30:00",
"text": "actual chunk content..."
}
}
Stored in-memory (server-side):
{
"conv_<uuid>": ConversationBufferMemory(
messages=[
HumanMessage(content="..."),
AIMessage(content="...")
]
)
}- CORS configuration for allowed origins
- Input validation with Pydantic
- Request size limits
- Rate limiting (to be implemented)
- Environment variables for sensitive data
- No credential storage in code
- Secure API key management
- XSS prevention with React
- Content Security Policy (to be implemented)
- HTTPS in production
- Async FastAPI endpoints
- Connection pooling
- Efficient vector search with Pinecone
- Chunk size optimization (1000 chars with 200 overlap)
- React component memoization
- Lazy loading
- Optimistic UI updates
- Debounced search (if implemented)
- Stateless API design (except conversation memory)
- Load balancer compatible
- Docker containerization
- Redis for conversation memory (distributed)
- Queue system for document processing
- CDN for static assets
- Database for conversation persistence
- Unit tests for services (RAG Engine, Document Processor)
- Integration tests for API endpoints
- Mock external services (Pinecone, OpenAI)
- 80%+ code coverage target
- Component unit tests
- Integration tests for user flows
- E2E tests (to be implemented)
- Local backend:
http://localhost:8000 - Local frontend:
http://localhost:3000 - Hot reloading enabled
- Docker containers
- Reverse proxy (nginx)
- HTTPS with SSL certificates
- Environment-based configuration
- Health check endpoints
- Logging and monitoring
- Modern Python web framework
- Automatic API documentation
- High performance (async)
- Type safety with Pydantic
- React with SSR/SSG support
- Excellent developer experience
- Production optimizations
- TypeScript support
- Managed service (no infrastructure)
- Fast similarity search
- Scalable
- Built for ML embeddings
- State-of-the-art language model
- Good for medical/clinical domain
- Reliable API
- Strong reasoning capabilities
- Lightweight (1kb)
- Simple API
- TypeScript support
- No boilerplate
- Comprehensive component library
- Professional design
- Accessibility built-in
- Theming support