Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
e29fac3
Add README for LF LLM demo
Deeksha-20-99 Sep 16, 2025
053b8d7
Adding work in progress code files for an llm example. Files: llm.py,…
Deeksha-20-99 Sep 16, 2025
473c81f
changed the file name of the file to be included in agent_llm.lf
Deeksha-20-99 Sep 16, 2025
46522a1
Added a quiz game. It is a game between two LLM models answering user…
Deeksha-20-99 Sep 19, 2025
9d9ee26
Updated the README.md for instructions to run the quiz game
Deeksha-20-99 Sep 19, 2025
fe1f605
Removing the older version of the file agent_llm.lf
Deeksha-20-99 Sep 19, 2025
b020664
Modified comments to the program
Deeksha-20-99 Sep 22, 2025
cc0a08a
created the files for quiz game between two llm models using main re…
Deeksha-20-99 Sep 23, 2025
632dc8e
Adding the git ignore file
Deeksha-20-99 Sep 23, 2025
6c8117d
Fixed the issue for the judge federate to receive the signal that mod…
Deeksha-20-99 Sep 25, 2025
2f1a884
Added the version of files for running on different devices
Deeksha-20-99 Sep 25, 2025
1958fbb
Adding a python script for llama 3.2 1B for jetson orin
Deeksha-20-99 Oct 9, 2025
60f642d
commented the code for testing
Deeksha-20-99 Oct 9, 2025
6a26cab
Testing Jetson
Deeksha-20-99 Oct 9, 2025
aef0ac9
Changed the file names in base class
Deeksha-20-99 Oct 9, 2025
c4c6353
Changed the RTI to jetson
Deeksha-20-99 Oct 9, 2025
9d503d5
corrected the ip for jetson orin
Deeksha-20-99 Oct 9, 2025
9a1730b
Add requirements.txt
hokeun Oct 14, 2025
ea20703
Move requirements.txt to top dir
hokeun Oct 14, 2025
e16438a
Adding the organized folders and README.md
Deeksha-20-99 Oct 15, 2025
cd83f0a
Updated the correct links for federated_execution and requirements in…
Deeksha-20-99 Oct 15, 2025
6b8c458
Updated the requirements.txt for README.md
Deeksha-20-99 Oct 15, 2025
abd32ed
changed the llm_b import statement
Deeksha-20-99 Oct 15, 2025
27d3561
Rename directories and remove unnecessary files
hokeun Oct 15, 2025
04f195a
Added more instruction on how to execute this demo README.md
Deeksha-20-99 Oct 16, 2025
15075fb
changed the path file names for the python files
Deeksha-20-99 Oct 16, 2025
105cecf
Added the images folder for README.md
Deeksha-20-99 Oct 16, 2025
35eefa9
Updated the image position on the README.md
Deeksha-20-99 Oct 16, 2025
5f3b61c
Revise README for LLM Demo overview and structure
hokeun Oct 16, 2025
66da8ce
corrected the spelling of environment README.md
Deeksha-20-99 Oct 16, 2025
67cf0bf
corrected the spelling README.md
Deeksha-20-99 Oct 16, 2025
18a8548
Changed the comments and removed the Hugging face token and it will b…
Deeksha-20-99 Oct 17, 2025
ec73fce
Updated the README.md for federated execution
Deeksha-20-99 Oct 17, 2025
03a1007
Corrected the path of the python files
Deeksha-20-99 Oct 17, 2025
050fe9f
Corrected the paths of the images in the README.md
Deeksha-20-99 Oct 17, 2025
8634b49
added the contributors name README.md
Deeksha-20-99 Oct 17, 2025
08f6ed6
Merge branch 'llm' of github.com:lf-lang/lf-demos into llm
Deeksha-20-99 Oct 17, 2025
2e73975
Removed torch and torchvision since they are dependent on the device
Deeksha-20-99 Oct 17, 2025
3ccb0f2
corrected few things on the README regarding the different reactors
Deeksha-20-99 Oct 17, 2025
ae28863
Updated the required python version in the README.md
Deeksha-20-99 Oct 17, 2025
b09a9c3
Added a command to check if requirements are installed README.md
Deeksha-20-99 Oct 17, 2025
042317f
added the common environment name README.md
Deeksha-20-99 Oct 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
llm/fed-gen/
llm/src-gen/
llm/include/
llm/bin
**__pycache__**
llm/=**
135 changes: 135 additions & 0 deletions llm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@

# LLM Demo Overview
This is a quiz-style game between two LLM agents. For each user question typed at the keyboard for the judge, both agents answer in parallel. The Judge announces whichever answer arrives first (or a timeout if neither responds within 60 sec), and prints per-question elapsed logical and physical times.

# Directory Structure
- [federated](src/federated/) - Directory for federated versions of LLM demos.
- [agents](src/agents/) - Directory for Python files for various LLM agents.

# Pre-requisites

You need Python >= 3.10 installed.

## Library Dependencies
To run this project, there are dependencies required which are in [requirements.txt](requirements.txt) file. The model used in this repository has been quantized using 4-bit precision (bnb_4bit) and relies on bitsandbytes for efficient matrix operations and memory optimization. So specific versions of bitsandbytes, torch, and torchvision are mandatory for compatibility.
While newer versions of other dependencies may work, the specific versions listed below have been tested and are recommended for optimal performance.
It is highly recommended to create a Python virtual environment or a Conda environment to manage dependencies. \
To create the a virtual environment follow the steps below.

### Step 1: Creating environment
```
python3 -m venv llm
source llm/bin/activate
```
For activating the environment everytime use "source llm/bin/activate".
or
```
conda create -n llm
conda activate llm
```
### Step 2: Installing the required packages
Check if pip is installed:
```
pip --version
```
If it is not installed:
```
python -m pip install --upgrade pip
```
Run this command to install the packages from the [requirements.txt](requirements.txt) file:\
**Note**: Since we are using LLMs with 7B and 70B parameters it is recommended to have a device with GPU support.
```
pip install -r requirements.txt
```
To check if all the requirements are installed, run:
```
pip list | grep -E "transformers|accelerate|tokenizers|bitsandbytes"
```
For installing torch:

1. For devices without GPU
```
pip install torch torchvision
```
2. For devices with GPU
Checking the CUDA version run this command:
```
nvidia-smi
```
Look for the line "CUDA Version" as shown in the image: \
<img src="img/cudaversion.png" width="400" height="300">

With the correct version install PyTorch from [PyTorch](https://pytorch.org/get-started/locally/) by selecting the right correct OS and compute platform as shown in the image below for Linux system with CUDA version 12.8: \
<img src="img/pytorch.png" width="400" height="300">
### Step 3: Model Dependencies
- **Pre-trained Models used in the agents/llm.py**: [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) , [meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) \
**Note:** Follow the steps below to obtain the access and authentication key for the hugging face models.
1. Create the user access token and follow the steps shown on the official documentation: [User access tokens](https://huggingface.co/docs/hub/en/security-tokens)
2. Log in using the Hugging Face CLI by running huggingface-cli login. Please refer to the official documentation for step-by-step instructions - [HuggingFace CLI](https://huggingface.co/docs/huggingface_hub/en/guides/cli)
3. For the Llama Models you will require access to use the models if you are using it for the first time. Open these links and apply for accessing the models ([meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf))

## System Requirements

To ensure optimal performance, the following hardware and software requirements are utilized. \
**Note:** To replicate this model, you can use any equivalent hardware that meets the computational requirements.

### Hardware Requirements
The demo was tested with the following hardware setup.
- **GPU**: NVIDIA RTX A6000

### Software Requirements
- **OS**: Linux
- **Python**
- **CUDA Version**: 12.8

Make sure the environment is properly configured to use CUDA for optimal GPU acceleration.

# Files and directories in this repository
- **`llm_base_class.lf`** - Contains the base reactors LlmA, LlmB, and Judge.
- **`llm_quiz_game.lf`** - Lingua Franca program that defines the quiz game reactors (LLM agent A, LLM agent B and Judge).

# Execution Workflow

### Step 1:
Run the **`llm_quiz_game.lf`**.

**Note:**
- Ensure that you specify the correct file paths

Run the following commands:

```
lfc src/llm_quiz_game.lf
```

### Step 2: Run the binary file and input the quiz question
Run the following commands:

```
./bin/llm_quiz_game
```

The system will ask for entering the quiz question which is to be obtained from the keyboard input.

Example output printed on the terminal:

<pre>

--------------------------------------------------
---- System clock resolution: 1 nsec
---- Start execution on Fri Sep 19 10:46:31 2025 ---- plus 772215861 nanoseconds
Enter the quiz question
What is the capital of South Korea?
Query: What is the capital of South Korea?

waiting...

Winner: LLM-B | logical 1184 ms | physical 1184 ms
Answer: Seoul.
--------------------------------------------------

</pre>

# Contributors
- Deeksha Prahlad ([email protected]), Ph.D. student at Arizona State University
- Hokeun Kim ([email protected], https://hokeun.github.io/), Assistant professor at Arizona State University
Binary file added llm/img/cudaversion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added llm/img/pytorch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions llm/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
accelerate
transformers
tokenizers
bitsandbytes>=0.43.0

89 changes: 89 additions & 0 deletions llm/src/agents/llm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
### Import Libraries
import transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from torch import cuda, bfloat16


### Model to be chosen to act as an agent
model_id = "meta-llama/Llama-2-7b-chat-hf"
model_id_2 = "meta-llama/Llama-2-70b-chat-hf"

### To check if there is GPU and convert it into float 16
has_cuda = torch.cuda.is_available()
dtype = torch.bfloat16 if has_cuda else torch.float32

### To convert the model into 4bit quantization
bnb_config = None
### if there is cuda then the model is converted to 4bit quantization
if has_cuda:
try:
import bitsandbytes as bnb
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=dtype,
)
except Exception:
bnb_config = None

### calling pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer_2 = AutoTokenizer.from_pretrained(model_id_2, use_fast=True)
for tok in (tokenizer, tokenizer_2):
if tok.pad_token_id is None:
tok.pad_token = tok.eos_token

### since both the models have same device map and using 4bit quantization for both
common = dict(
device_map="auto" if has_cuda else None,
torch_dtype=dtype, # Changed from dtype=dtype (correct arg name)
low_cpu_mem_usage=True,
)
if bnb_config is not None:
common["quantization_config"] = bnb_config

### calling pre-trained model
model = AutoModelForCausalLM.from_pretrained(model_id, **common)
model_2 = AutoModelForCausalLM.from_pretrained(model_id_2, **common)
model.eval(); model_2.eval()


### arguments for both the models
GEN_A = dict(max_new_tokens=24, do_sample=False, temperature=0.1,
eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
GEN_B = dict(max_new_tokens=24, do_sample=False, temperature=0.1,
eos_token_id=tokenizer_2.eos_token_id, pad_token_id=tokenizer_2.pad_token_id)

###to resturn only one line answers
def postprocess(text: str) -> str:
t = text.strip()
for sep in ["\n", ". ", " "]:
idx = t.find(sep)
if idx > 0:
t = t[:idx]
break
return t.strip().strip(":").strip()

###Calling agent1 from .lf code
def agent1(q: str) -> str:
prompt = f"You are a concise Q&A assistant.\n\n{q}\n"
inputs = tokenizer(prompt, return_tensors="pt")
if has_cuda: inputs = {k: v.to("cuda") for k, v in inputs.items()}
with torch.no_grad():
out = model.generate(**inputs, **GEN_A)
prompt_len = inputs["input_ids"].shape[1]
result = tokenizer.decode(out[0][prompt_len:], skip_special_tokens=True)
return postprocess(result)

###Calling agent2 from .lf code
def agent2(q: str) -> str:
prompt = f"You are a concise Q&A assistant.\n\n{q}\n"
inputs = tokenizer_2(prompt, return_tensors="pt")
if has_cuda: inputs = {k: v.to("cuda") for k, v in inputs.items()}
with torch.no_grad():
out = model_2.generate(**inputs, **GEN_B)
prompt_len = inputs["input_ids"].shape[1]
result = tokenizer_2.decode(out[0][prompt_len:], skip_special_tokens=True)
return postprocess(result)
78 changes: 78 additions & 0 deletions llm/src/agents/llm_a.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# llm_a.py

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

#Model
model_id = "meta-llama/Llama-2-7b-chat-hf"


has_cuda = torch.cuda.is_available()
if not has_cuda:
raise RuntimeError("CUDA GPU required for this configuration.")
dtype = torch.bfloat16 if has_cuda else torch.float32

#4-bit quantization
bnb_config = None
if has_cuda:
try:
import bitsandbytes as bnb
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=dtype,
)
except Exception:
bnb_config = None

#Tokenizer and the token is automatically used if logged in via CLI
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
if tokenizer.pad_token_id is None:
tokenizer.pad_token = tokenizer.eos_token


common = dict(
device_map="auto" if has_cuda else None,
torch_dtype=dtype,
low_cpu_mem_usage=True,
)

if bnb_config is not None:
common["quantization_config"] = bnb_config

#model
model = AutoModelForCausalLM.from_pretrained(model_id, **common)
model.eval()

#Generation
GEN_A = dict(
max_new_tokens=24,
do_sample=False,
temperature=0.1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id
)

#post-processing
def postprocess(text: str) -> str:
t = text.strip()
for sep in ["\n", ". ", " "]:
idx = t.find(sep)
if idx > 0:
t = t[:idx]
break
return t.strip().strip(":").strip()

#Agent 1
def agent1(q: str) -> str:
prompt = f"You are a concise Q&A assistant.\n\n{q}\n"
inputs = tokenizer(prompt, return_tensors="pt")
if has_cuda:
inputs = {k: v.to("cuda") for k, v in inputs.items()}
with torch.no_grad():
out = model.generate(**inputs, **GEN_A)
prompt_len = inputs["input_ids"].shape[1]
result = tokenizer.decode(out[0][prompt_len:], skip_special_tokens=True)
print(result)
return postprocess(result)
Loading