Here are
41 public repositories
matching this topic...
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Updated
Sep 12, 2025
Python
Turn any computer or edge device into a command center for your computer vision projects.
Updated
Sep 13, 2025
Python
The simplest way to serve AI/ML models in production
Updated
Sep 12, 2025
Python
An open-source computer vision framework to build and deploy apps in minutes
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
Updated
Sep 12, 2025
Rust
A REST API for Caffe using Docker and Go
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
Updated
Jun 28, 2022
Python
Work with LLMs on a local environment using containers
Updated
Sep 12, 2025
TypeScript
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
Updated
Jun 28, 2022
Python
This is a repository for an object detection inference API using the Tensorflow framework.
Updated
Jun 28, 2022
Python
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
Updated
Oct 20, 2024
Scala
Orkhon: ML Inference Framework and Server Runtime
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
Updated
Nov 2, 2021
PowerShell
Deploy DL/ ML inference pipelines with minimal extra code.
Updated
Nov 20, 2024
Python
A standalone inference server for trained Rubix ML estimators.
[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI
Updated
Jun 25, 2025
Python
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
Updated
Jun 2, 2024
TypeScript
Fullstack machine learning inference template
Updated
Nov 24, 2023
Jupyter Notebook
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
Updated
Jun 28, 2023
Python
Improve this page
Add a description, image, and links to the
inference-server
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
inference-server
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.