Clinical ChatBot Architecture

System Overview

The Clinical ChatBot is a full-stack application that combines modern web technologies with AI/ML capabilities to provide evidence-based clinical information through a conversational interface.

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                         Frontend                             │
│                    (Next.js + MUI)                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ ChatContainer│  │ DocumentUpload│  │  State Mgmt  │     │
│  │              │  │              │  │   (Zustand)   │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘
                            │
                    REST API (Axios)
                            │
┌─────────────────────────────────────────────────────────────┐
│                        Backend                               │
│                    (FastAPI + Python)                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ Chat Routes  │  │Document Routes│  │Health Routes │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│         │                  │                                │
│  ┌──────────────────────────────────────────────────┐     │
│  │              Service Layer                        │     │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐ │     │
│  │  │ RAG Engine │  │  Document  │  │  Pinecone  │ │     │
│  │  │            │  │ Processor  │  │  Service   │ │     │
│  │  └────────────┘  └────────────┘  └────────────┘ │     │
│  └──────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────┘
              │                         │
       ┌──────┴──────┐          ┌──────┴──────┐
       │  OpenAI     │          │  Pinecone   │
       │   GPT-4     │          │  Vector DB  │
       └─────────────┘          └─────────────┘

Component Architecture

Frontend Layer

1. Pages (`src/pages/`)

index.tsx: Main entry point, renders ChatContainer
_app.tsx: Global app wrapper with theme provider
_document.tsx: Custom HTML document structure

2. Components (`src/components/`)

ChatContainer.tsx:
- Main orchestration component
- Manages message display and user interactions
- Handles conversation state
ChatMessage.tsx:
- Renders individual messages
- Supports markdown formatting
- Role-based styling (user vs assistant)
ChatInput.tsx:
- Message input field
- Send button with loading state
- Keyboard shortcuts (Enter to send)
DocumentUpload.tsx:
- File upload interface
- Progress tracking
- Success/error feedback

3. Services (`src/services/`)

api.ts:
- Centralized API client
- Axios configuration
- Error handling
- Request/response interceptors

4. Store (`src/store/`)

chatStore.ts:
- Zustand state management
- Message history
- Loading states
- Error handling
- Actions (sendMessage, clearConversation, etc.)

5. Theme (`src/theme/`)

theme.ts:
- Material-UI theme configuration
- Color palette (clinical/professional)
- Typography settings
- Component customization

Backend Layer

1. API Routes (`app/api/routes/`)

chat.py:

POST /api/chat/message: Send message and get response
GET /api/chat/history/{conversation_id}: Retrieve conversation history
DELETE /api/chat/conversation/{conversation_id}: Clear conversation
POST /api/chat/conversations/clear-all: Clear all conversations

documents.py:

POST /api/documents/upload: Upload PDF document
POST /api/documents/upload-text: Upload text content
GET /api/documents/stats: Get index statistics
DELETE /api/documents/namespace/{namespace}: Delete namespace

health.py:

GET /api/health: System health check
GET /api/health/ping: Simple connectivity check

2. Services (`app/services/`)

rag_engine.py:

Core RAG implementation
Document retrieval from vector store
Context formatting
LLM response generation
Conversation memory management
Supports both RAG and non-RAG modes

document_processor.py:

PDF loading and parsing
Text chunking with overlap
Metadata enrichment
Document indexing pipeline
Text content processing

pinecone_service.py:

Pinecone client initialization
Vector store operations
Similarity search
Document addition/deletion
Index management
Health checking

3. Models (`app/models/`)

schemas.py:

Pydantic models for data validation
Request/response schemas
Type safety
API documentation

4. Configuration (`app/config.py`)

Environment variable management
Settings validation
Default values
CORS configuration

Data Flow

Chat Message Flow

User Input: User types message in ChatInput component
State Update: Zustand store adds message to local state
API Request: API service sends POST request to /api/chat/message
Backend Processing:
- Route handler receives request
- RAG Engine retrieves relevant documents from Pinecone
- Documents are formatted as context
- LLM generates response with context
- Conversation memory is updated
Response: Backend returns response with sources
UI Update: Zustand store updates with assistant message
Render: ChatContainer displays new message

Document Upload Flow

File Selection: User selects PDF file
Upload Request: API service sends multipart/form-data to /api/documents/upload
Backend Processing:
- Save temporary file
- Load PDF with PyPDFLoader
- Split into chunks with RecursiveCharacterTextSplitter
- Enrich with metadata
- Generate embeddings with OpenAI
- Store in Pinecone vector database
Response: Return document ID and stats
UI Update: Show success message and chunk count

RAG Implementation Details

Retrieval Process

Query Embedding: User query is embedded using OpenAI embeddings (1536 dimensions)
Similarity Search: Pinecone performs cosine similarity search
Top-K Retrieval: Returns top 5 most relevant document chunks
Score Filtering: Results include relevance scores

Context Generation

Document Formatting: Retrieved chunks are formatted with metadata
Context Assembly: Multiple documents are combined into context string
Source Tracking: Maintains references to original documents

Response Generation

Prompt Construction: System prompt + context + conversation history + user query
LLM Invocation: GPT-4 generates response using context
Memory Update: Conversation history is updated for continuity

Database Schema

Pinecone Index Structure

Vector Entry:
{
  "id": "chunk_<uuid>",
  "values": [1536-dimensional embedding],
  "metadata": {
    "document_id": "doc_<uuid>",
    "filename": "diabetes_guidelines.pdf",
    "document_type": "clinical_guideline",
    "page": 5,
    "chunk_id": "chunk_42",
    "chunk_index": 42,
    "total_chunks": 100,
    "indexed_at": "2025-01-15T10:30:00",
    "text": "actual chunk content..."
  }
}

Conversation Memory

Stored in-memory (server-side):

{
  "conv_<uuid>": ConversationBufferMemory(
    messages=[
      HumanMessage(content="..."),
      AIMessage(content="...")
    ]
  )
}

Security Considerations

API Security

CORS configuration for allowed origins
Input validation with Pydantic
Request size limits
Rate limiting (to be implemented)

Data Security

Environment variables for sensitive data
No credential storage in code
Secure API key management

Frontend Security

XSS prevention with React
Content Security Policy (to be implemented)
HTTPS in production

Performance Optimizations

Backend

Async FastAPI endpoints
Connection pooling
Efficient vector search with Pinecone
Chunk size optimization (1000 chars with 200 overlap)

Frontend

React component memoization
Lazy loading
Optimistic UI updates
Debounced search (if implemented)

Scalability

Horizontal Scaling

Stateless API design (except conversation memory)
Load balancer compatible
Docker containerization

Future Enhancements

Redis for conversation memory (distributed)
Queue system for document processing
CDN for static assets
Database for conversation persistence

Testing Strategy

Backend Tests

Unit tests for services (RAG Engine, Document Processor)
Integration tests for API endpoints
Mock external services (Pinecone, OpenAI)
80%+ code coverage target

Frontend Tests

Component unit tests
Integration tests for user flows
E2E tests (to be implemented)

Deployment Architecture

Development

Local backend: http://localhost:8000
Local frontend: http://localhost:3000
Hot reloading enabled

Production

Docker containers
Reverse proxy (nginx)
HTTPS with SSL certificates
Environment-based configuration
Health check endpoints
Logging and monitoring

Technology Justification

Backend: FastAPI

Modern Python web framework
Automatic API documentation
High performance (async)
Type safety with Pydantic

Frontend: Next.js

React with SSR/SSG support
Excellent developer experience
Production optimizations
TypeScript support

Vector DB: Pinecone

Managed service (no infrastructure)
Fast similarity search
Scalable
Built for ML embeddings

LLM: OpenAI GPT-4

State-of-the-art language model
Good for medical/clinical domain
Reliable API
Strong reasoning capabilities

State Management: Zustand

Lightweight (1kb)
Simple API
TypeScript support
No boilerplate

UI Library: Material-UI

Comprehensive component library
Professional design
Accessibility built-in
Theming support

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History