BiocBot is an AI-powered study assistant platform that enables students to interact with course material in a chat-based format. Instructors can upload documents (PDFs, DOCX, or TXT), which are automatically parsed, chunked, and embedded into a vector database (Qdrant) for semantic search. When a student asks a question, the system retrieves relevant chunks and generates a response grounded in course content.
- Document Management: Upload and organize course materials
- Vector Search: Semantic search across documents using Qdrant
- AI Chat Interface: Student interaction with course content
- Per-Course Retrieval Mode: Instructor-controlled additive vs single-unit retrieval for chat
- Assessment Questions: Create and manage course assessments
- Course Structure: Organize content by units/lectures
- User Management: Separate interfaces for instructors and students
BiocBot follows a split architecture with a public frontend and a private backend, adhering to clear separation of concerns for maintainability and security.
- Frontend: HTML + Vanilla JS (no frameworks), styled via separate CSS files
- Backend: Node.js (Express), built with modular architecture
- Database: MongoDB (for documents, user sessions, analytics)
- Vector Database: Qdrant for semantic search and similarity retrieval
- Embeddings: Ollama with nomic-embed-text model
- Document Processing: UBC GenAI Toolkit modules
- Node.js v18.x or higher
- MongoDB instance
- Qdrant vector database (Docker recommended)
- Ollama with nomic-embed-text model
git clone <repository-url>
cd tlef-biocbot
npm installCreate a .env file in the root directory with the following variables:
# MongoDB Connection
MONGO_URI=mongodb://localhost:27017/biocbot
# Server Port
TLEF_BIOCBOT_PORT=8080
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=super-secret-dev-key
# Embeddings Provider Configuration
EMBEDDING_PROVIDER=ubc-genai-toolkit-llm
# LLM Provider Settings (for Embeddings)
LLM_PROVIDER=ollama
LLM_API_KEY=nokey
LLM_ENDPOINT=http://localhost:11434
LLM_EMBEDDING_MODEL=nomic-embed-text
LLM_DEFAULT_MODEL=llama3.1docker run -p 6333:6333 qdrant/qdrantollama pull nomic-embed-text
ollama servenpm run devBiocBot now includes advanced vector search capabilities through Qdrant integration:
- Automatic Document Processing: Documents are automatically chunked, embedded, and stored
- Semantic Search: Find relevant content using natural language queries
- Course-Aware Search: Filter results by course and lecture
- Real-time Indexing: New documents are immediately searchable
- GET /api/qdrant/status- Check Qdrant service status
- POST /api/qdrant/process-document- Process and store document
- POST /api/qdrant/search- Semantic search across documents
- DELETE /api/qdrant/document/:id- Delete document chunks
- GET /api/qdrant/collection-stats- Get collection statistics
Visit /qdrant-test to test the Qdrant functionality:
- Process test documents
- Perform semantic searches
- View collection statistics
- Access: Navigate to /instructor
- Onboarding: Complete course setup
- Upload Documents: Add course materials to units
- Create Questions: Build assessments for students
- Publish Units: Make content available to students
- Retrieval Mode: On the course Home page, toggle “Use additive retrieval” to allow chat to include earlier published units in addition to the selected unit. When off, chat uses only the selected unit.
- Access: Navigate to /student
- Course Selection: Choose your course
- Assessment: Complete calibration questions
- Chat Interface: Select a unit, then ask questions about course material. Chat retrieval respects the course’s retrieval mode.
- Semantic Search: Find relevant content using natural language
tlef-biocbot/
├── public/                 # Frontend assets
│   ├── instructor/        # Instructor interface
│   ├── student/          # Student interface
│   └── qdrant-test.html  # Qdrant testing page
├── src/                   # Backend source
│   ├── models/           # Data models
│   ├── routes/           # API routes
│   ├── services/         # Business logic
│   └── server.js         # Main server file
└── documents/            # Course documentation
- QdrantService: Handles vector database operations
- Document Processing: Automatic chunking and embedding
- Semantic Search: Vector similarity search
- Course Management: Structured content organization
- ✅ Phase 1: Backend pipeline with Qdrant integration
- ✅ Document Upload: File and text document support
- ✅ Vector Search: Semantic document retrieval
- 🔄 Assessment System: Question creation and management
- 🔄 Student Interface: Chat-based learning experience
This project follows clean architecture principles optimized for clarity, maintainability, and junior developer readability. All code should be:
- Modular: Single responsibility functions and classes
- Documented: Comprehensive docblocks and inline comments
- Accessible: Clear variable names and logical flow
- Secure: Input validation and error handling
ISC License