Commit 082c0ac
authored
Supporting Multi-LoRA inferencing via JetStream server (#221)
Supporting Multi-LoRA inferencing via JetStream server following [LLM
Inference gateway API
protocols](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#inference-api-protocol).
- Implemented an adapter_tensorstore to load, store, manage and unload
the adapter weights
- Added and exposed [required
metrics](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#metrics-reporting)
at prometheus endpoint
- Added multi_lora_decoding service with corresponding APIs as per the
[requirement](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#inference-api-protocol).
- Implemented single LoRA functionality support.1 parent 045c9a1 commit 082c0ac
File tree
21 files changed
+3585
-68
lines changed- jetstream
- core
- lora
- metrics
- proto
- engine
- tests/core
- lora
- tools
- maxtext
21 files changed
+3585
-68
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
| |||
0 commit comments