Skip to content

Commit 082c0ac

Browse files
authored
Supporting Multi-LoRA inferencing via JetStream server (#221)
Supporting Multi-LoRA inferencing via JetStream server following [LLM Inference gateway API protocols](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#inference-api-protocol). - Implemented an adapter_tensorstore to load, store, manage and unload the adapter weights - Added and exposed [required metrics](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#metrics-reporting) at prometheus endpoint - Added multi_lora_decoding service with corresponding APIs as per the [requirement](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#inference-api-protocol). - Implemented single LoRA functionality support.
1 parent 045c9a1 commit 082c0ac

21 files changed

+3585
-68
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,9 @@ python -m unittest -v jetstream.tests.core.test_orchestrator
6565
# Test JetStream core server library
6666
python -m unittest -v jetstream.tests.core.test_server
6767
68+
# Test JetStream lora adapter tensorstore
69+
python -m unittest -v jetstream.tests.core.lora.test_adapter_tensorstore
70+
6871
# Test mock JetStream engine implementation
6972
python -m unittest -v jetstream.tests.engine.test_mock_engine
7073

0 commit comments

Comments
 (0)