Skip to content

Latest commit

 

History

History
352 lines (286 loc) · 11.3 KB

File metadata and controls

352 lines (286 loc) · 11.3 KB

Clinical ChatBot Architecture

System Overview

The Clinical ChatBot is a full-stack application that combines modern web technologies with AI/ML capabilities to provide evidence-based clinical information through a conversational interface.

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                         Frontend                             │
│                    (Next.js + MUI)                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ ChatContainer│  │ DocumentUpload│  │  State Mgmt  │     │
│  │              │  │              │  │   (Zustand)   │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘
                            │
                    REST API (Axios)
                            │
┌─────────────────────────────────────────────────────────────┐
│                        Backend                               │
│                    (FastAPI + Python)                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ Chat Routes  │  │Document Routes│  │Health Routes │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│         │                  │                                │
│  ┌──────────────────────────────────────────────────┐     │
│  │              Service Layer                        │     │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐ │     │
│  │  │ RAG Engine │  │  Document  │  │  Pinecone  │ │     │
│  │  │            │  │ Processor  │  │  Service   │ │     │
│  │  └────────────┘  └────────────┘  └────────────┘ │     │
│  └──────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────┘
              │                         │
       ┌──────┴──────┐          ┌──────┴──────┐
       │  OpenAI     │          │  Pinecone   │
       │   GPT-4     │          │  Vector DB  │
       └─────────────┘          └─────────────┘

Component Architecture

Frontend Layer

1. Pages (src/pages/)

  • index.tsx: Main entry point, renders ChatContainer
  • _app.tsx: Global app wrapper with theme provider
  • _document.tsx: Custom HTML document structure

2. Components (src/components/)

  • ChatContainer.tsx:

    • Main orchestration component
    • Manages message display and user interactions
    • Handles conversation state
  • ChatMessage.tsx:

    • Renders individual messages
    • Supports markdown formatting
    • Role-based styling (user vs assistant)
  • ChatInput.tsx:

    • Message input field
    • Send button with loading state
    • Keyboard shortcuts (Enter to send)
  • DocumentUpload.tsx:

    • File upload interface
    • Progress tracking
    • Success/error feedback

3. Services (src/services/)

  • api.ts:
    • Centralized API client
    • Axios configuration
    • Error handling
    • Request/response interceptors

4. Store (src/store/)

  • chatStore.ts:
    • Zustand state management
    • Message history
    • Loading states
    • Error handling
    • Actions (sendMessage, clearConversation, etc.)

5. Theme (src/theme/)

  • theme.ts:
    • Material-UI theme configuration
    • Color palette (clinical/professional)
    • Typography settings
    • Component customization

Backend Layer

1. API Routes (app/api/routes/)

chat.py:

  • POST /api/chat/message: Send message and get response
  • GET /api/chat/history/{conversation_id}: Retrieve conversation history
  • DELETE /api/chat/conversation/{conversation_id}: Clear conversation
  • POST /api/chat/conversations/clear-all: Clear all conversations

documents.py:

  • POST /api/documents/upload: Upload PDF document
  • POST /api/documents/upload-text: Upload text content
  • GET /api/documents/stats: Get index statistics
  • DELETE /api/documents/namespace/{namespace}: Delete namespace

health.py:

  • GET /api/health: System health check
  • GET /api/health/ping: Simple connectivity check

2. Services (app/services/)

rag_engine.py:

  • Core RAG implementation
  • Document retrieval from vector store
  • Context formatting
  • LLM response generation
  • Conversation memory management
  • Supports both RAG and non-RAG modes

document_processor.py:

  • PDF loading and parsing
  • Text chunking with overlap
  • Metadata enrichment
  • Document indexing pipeline
  • Text content processing

pinecone_service.py:

  • Pinecone client initialization
  • Vector store operations
  • Similarity search
  • Document addition/deletion
  • Index management
  • Health checking

3. Models (app/models/)

schemas.py:

  • Pydantic models for data validation
  • Request/response schemas
  • Type safety
  • API documentation

4. Configuration (app/config.py)

  • Environment variable management
  • Settings validation
  • Default values
  • CORS configuration

Data Flow

Chat Message Flow

  1. User Input: User types message in ChatInput component
  2. State Update: Zustand store adds message to local state
  3. API Request: API service sends POST request to /api/chat/message
  4. Backend Processing:
    • Route handler receives request
    • RAG Engine retrieves relevant documents from Pinecone
    • Documents are formatted as context
    • LLM generates response with context
    • Conversation memory is updated
  5. Response: Backend returns response with sources
  6. UI Update: Zustand store updates with assistant message
  7. Render: ChatContainer displays new message

Document Upload Flow

  1. File Selection: User selects PDF file
  2. Upload Request: API service sends multipart/form-data to /api/documents/upload
  3. Backend Processing:
    • Save temporary file
    • Load PDF with PyPDFLoader
    • Split into chunks with RecursiveCharacterTextSplitter
    • Enrich with metadata
    • Generate embeddings with OpenAI
    • Store in Pinecone vector database
  4. Response: Return document ID and stats
  5. UI Update: Show success message and chunk count

RAG Implementation Details

Retrieval Process

  1. Query Embedding: User query is embedded using OpenAI embeddings (1536 dimensions)
  2. Similarity Search: Pinecone performs cosine similarity search
  3. Top-K Retrieval: Returns top 5 most relevant document chunks
  4. Score Filtering: Results include relevance scores

Context Generation

  1. Document Formatting: Retrieved chunks are formatted with metadata
  2. Context Assembly: Multiple documents are combined into context string
  3. Source Tracking: Maintains references to original documents

Response Generation

  1. Prompt Construction: System prompt + context + conversation history + user query
  2. LLM Invocation: GPT-4 generates response using context
  3. Memory Update: Conversation history is updated for continuity

Database Schema

Pinecone Index Structure

Vector Entry:
{
  "id": "chunk_<uuid>",
  "values": [1536-dimensional embedding],
  "metadata": {
    "document_id": "doc_<uuid>",
    "filename": "diabetes_guidelines.pdf",
    "document_type": "clinical_guideline",
    "page": 5,
    "chunk_id": "chunk_42",
    "chunk_index": 42,
    "total_chunks": 100,
    "indexed_at": "2025-01-15T10:30:00",
    "text": "actual chunk content..."
  }
}

Conversation Memory

Stored in-memory (server-side):

{
  "conv_<uuid>": ConversationBufferMemory(
    messages=[
      HumanMessage(content="..."),
      AIMessage(content="...")
    ]
  )
}

Security Considerations

API Security

  • CORS configuration for allowed origins
  • Input validation with Pydantic
  • Request size limits
  • Rate limiting (to be implemented)

Data Security

  • Environment variables for sensitive data
  • No credential storage in code
  • Secure API key management

Frontend Security

  • XSS prevention with React
  • Content Security Policy (to be implemented)
  • HTTPS in production

Performance Optimizations

Backend

  • Async FastAPI endpoints
  • Connection pooling
  • Efficient vector search with Pinecone
  • Chunk size optimization (1000 chars with 200 overlap)

Frontend

  • React component memoization
  • Lazy loading
  • Optimistic UI updates
  • Debounced search (if implemented)

Scalability

Horizontal Scaling

  • Stateless API design (except conversation memory)
  • Load balancer compatible
  • Docker containerization

Future Enhancements

  • Redis for conversation memory (distributed)
  • Queue system for document processing
  • CDN for static assets
  • Database for conversation persistence

Testing Strategy

Backend Tests

  • Unit tests for services (RAG Engine, Document Processor)
  • Integration tests for API endpoints
  • Mock external services (Pinecone, OpenAI)
  • 80%+ code coverage target

Frontend Tests

  • Component unit tests
  • Integration tests for user flows
  • E2E tests (to be implemented)

Deployment Architecture

Development

  • Local backend: http://localhost:8000
  • Local frontend: http://localhost:3000
  • Hot reloading enabled

Production

  • Docker containers
  • Reverse proxy (nginx)
  • HTTPS with SSL certificates
  • Environment-based configuration
  • Health check endpoints
  • Logging and monitoring

Technology Justification

Backend: FastAPI

  • Modern Python web framework
  • Automatic API documentation
  • High performance (async)
  • Type safety with Pydantic

Frontend: Next.js

  • React with SSR/SSG support
  • Excellent developer experience
  • Production optimizations
  • TypeScript support

Vector DB: Pinecone

  • Managed service (no infrastructure)
  • Fast similarity search
  • Scalable
  • Built for ML embeddings

LLM: OpenAI GPT-4

  • State-of-the-art language model
  • Good for medical/clinical domain
  • Reliable API
  • Strong reasoning capabilities

State Management: Zustand

  • Lightweight (1kb)
  • Simple API
  • TypeScript support
  • No boilerplate

UI Library: Material-UI

  • Comprehensive component library
  • Professional design
  • Accessibility built-in
  • Theming support