Skip to content

Add chat-llama-nemotron example as a regular directory #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,8 @@ Community examples are sample code and deployments for RAG pipelines that are no

* [AI Podcast Assistant](./ai-podcast-assistant/)

This example demonstrates a comprehensive workflow for processing podcast audio using the Phi-4-Multimodal LLM through NVIDIA NIM Microservices. It includes functionality for generating detailed notes from audio content, creating concise summaries, and translating both transcriptions and summaries into different languages. The implementation handles long audio files by automatically chunking them for efficient processing and preserves formatting during translation.
This example demonstrates a comprehensive workflow for processing podcast audio using the Phi-4-Multimodal LLM through NVIDIA NIM Microservices. It includes functionality for generating detailed notes from audio content, creating concise summaries, and translating both transcriptions and summaries into different languages. The implementation handles long audio files by automatically chunking them for efficient processing and preserves formatting during translation.

* [Chat with LLM Llama 3.1 Nemotron Nano 4B](./chat-llama-nemotron/)

This is a React-based conversational UI designed for interacting with a powerful local LLM. It incorporates RAG to enhance contextual understanding and is backed by an NVIDIA Dynamo inference server running the NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1 model. The setup enables low-latency, cloud-free AI assistant capabilities, with live document search and reasoning, all deployable on local or edge infrastructure.
156 changes: 156 additions & 0 deletions community/chat-llama-nemotron/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Dependencies
node_modules/
.pnp/
.pnp.js
package-lock.json
yarn.lock

# Testing
coverage/
.nyc_output/
test-results/
junit.xml

# Production
build/
dist/
out/
.next/
.nuxt/
.cache/
.output/

# Environment files
.env
.env.*
!.env.example
.env.local
.env.development.local
.env.test.local
.env.production.local
.env*.local
*.env

# Logs
npm-debug.log*
yarn-debug.log*
yarn-error.log*
logs/
*.log
debug.log
error.log

# IDE
.idea/
.vscode/
*.swp
*.swo
*.sublime-workspace
*.sublime-project
.project
.classpath
.settings/
*.code-workspace

# OS
.DS_Store
Thumbs.db
desktop.ini
$RECYCLE.BIN/
*.lnk

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
.env/
.venv/
pip-log.txt
pip-delete-this-directory.txt
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
.python-version
*.egg-info/
.installed.cfg
*.egg
MANIFEST
dist/
build/
eggs/
parts/
bin/
var/
sdist/
develop-eggs/
.installed.cfg
lib/
lib64/

# RAG specific
data/
embeddings/
*.faiss
*.pkl
*.bin
*.vec
*.model
*.index
chunks/
documents/
vectors/
corpus/
indexes/

# Temporary files
*.tmp
*.temp
*.bak
*.swp
*~
*.swx
*.swo
*.swn
*.bak
*.orig
*.rej
*.patch
*.diff

# Build artifacts
*.min.js
*.min.css
*.map
*.gz
*.br
*.zip
*.tar
*.tar.gz
*.tgz
*.rar
*.7z

# Debug
.debug/
debug/
debug.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Local development
.local/
local/
local.*
135 changes: 135 additions & 0 deletions community/chat-llama-nemotron/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Chat with Llama-3.1-Nemotron-Nano-4B-v1.1

A React-based chat interface for interacting with an LLM, featuring RAG (Retrieval-Augmented Generation) capabilities and NVIDIA Dynamo backend serving NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1.

## Project Structure

```
.
├── frontend/ # React frontend application
├── backend-rag/ # RAG service backend
└── backend-dynamo/ # NVIDIA Dynamo backend service
└── llm-proxy/ # Proxy server for NVIDIA Dynamo
```

## Prerequisites

- Node.js 18 or higher
- Python 3.8 or higher
- NVIDIA GPU with CUDA support (for LLM serving with NVIDIA Dynamo)
- Docker (optional, for containerized deployment)
- Git

## Configuration

### Frontend

The frontend configuration is managed through YAML files in `frontend/public/config/`:

- `app_config.yaml`: Main application configuration:
- API endpoints
- UI settings
- File upload settings

See [frontend/README.md](frontend/README.md)

### Backend

Each service has its own configuration files:

- RAG backend: see [backend-rag/README.md](backend-rag/README.md)
- LLM Proxy: see [backend-dynamo/llm-proxy/README.md](backend-dynamo/llm-proxy/README.md)
- DynamoDB backend: see [backend-dynamo/README.md](backend-dynamo/README.md)


## Setup

### Llama-3.1-Nemotron-Nano-4B-v1.1 running on a GPU Server

This step should be performed on a machine with a GPU.

Set NVIDIA Dynamo backend running Llama-3.1-Nemotron-Nano-4B-v1.1 following the instruction [backend-dynamo/README.md](backend-dynamo/README.md).

### Local client with a local RAG database

These steps can be performed locally and don't require a GPU.

1. Clone the repository:
```bash
git clone <this-repository-url>
cd react-llama-client
```

2. Install frontend dependencies:
```bash
cd frontend
npm install
```

3. Set up backend services:

For Unix/macOS:
```bash
# RAG Backend
cd backend-rag
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# LLM Proxy
cd backend-dynamo/llm-proxy
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

For Windows:
```bash
# RAG Backend
cd backend-rag
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt

# LLM Proxy
cd backend-dynamo\llm-proxy
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
```

4. Start the services (each in a new terminal):

For Unix/macOS:
```bash
# Start frontend (in frontend directory)
cd frontend
npm start

# Start RAG backend (in backend-rag directory)
cd backend-rag
source venv/bin/activate
python src/app.py

# Start LLM proxy (in backend-dynamo/llm-proxy directory)
cd backend-dynamo/llm-proxy
source venv/bin/activate
python proxy.py
```

For Windows:
```bash
# Start frontend (in frontend directory)
cd frontend
npm start

# Start RAG backend (in backend-rag directory)
cd backend-rag
.\venv\Scripts\activate
python src\app.py

# Start LLM proxy (in backend-dynamo\llm-proxy directory)
cd backend-dynamo\llm-proxy
.\venv\Scripts\activate
python proxy.py
```
43 changes: 43 additions & 0 deletions community/chat-llama-nemotron/backend-dynamo/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
.env/
.venv/
pip-log.txt
pip-delete-this-directory.txt
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Logs
logs/
*.log

# IDE
.idea/
.vscode/
*.swp
*.swo

# Environment variables
.env
.env.local
.env.*.local

# AWS
.aws/
aws.json
credentials.json
Loading