#

inference-server

Here are 51 public repositories matching this topic...

Michael-A-Kuykendall / shimmy

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

rust machine-learning transformers api-server developer-tools llama command-line-tool lora inference-server rust-crate huggingface huggingface-transformers huggingface-models llamacpp llm-inference local-ai gguf ollama-api openai-compatible

Updated Oct 23, 2025
Rust

containers / ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

ai containers cuda intel hip hacktoberfest inference-server podman llm llamacpp vllm

Updated Nov 8, 2025
Python

inference

roboflow / inference

Turn any computer or edge device into a command center for your computer vision projects.

Updated Nov 8, 2025
Python

truss

basetenlabs / truss

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

Updated Nov 7, 2025
Python

pipeless-ai / pipeless

An open-source computer vision framework to build and deploy apps in minutes

Updated May 8, 2024
Rust

underneathall / pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Updated Feb 14, 2023
Python

NVIDIA / gpu-rest-engine

A REST API for Caffe using Docker and Go

docker caffe deep-learning gpu inference inference-server

Updated Jul 20, 2018
C++

BMW-InnovationLab / BMW-YOLOv4-Inference-API-GPU

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

Updated Jun 28, 2022
Python

containers / podman-desktop-extension-ai-lab

Work with LLMs on a local environment using containers

ai local containers inference-server podman llms

Updated Nov 7, 2025
TypeScript

BMW-InnovationLab / BMW-YOLOv4-Inference-API-CPU

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

Updated Jun 28, 2022
Python

BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU

This is a repository for an object detection inference API using the Tensorflow framework.

Updated Jun 28, 2022
Python

kibae / onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

machine-learning ai deep-learning cuda inference-server nueral-networks contributions-welcome onnx onnxruntime

Updated Oct 30, 2025
C++

autodeployai / ai-serving

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

inference pmml inference-server onnx onnx-models ai-serving pmml-model onnx-inference onnx-rest pmml-deployment pmml-rest pmml-grpc onnx-grpc pmml-realtime onnx-realtime pmml-inference

Updated Nov 6, 2025
Scala

orkhon

vertexclique / orkhon

Orkhon: ML Inference Framework and Server Runtime

machine-learning async tensorflow multiprocessing python3 inference-server data-parallelism

Updated Feb 1, 2021
Rust

kf5i / k3ai

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

kubernetes artificial-intelligence edge datascience machinelearning inference-server kubeflow kubeflow-pipelines k3s

Updated Nov 2, 2021
PowerShell

notAI-tech / fastDeploy

Deploy DL/ ML inference pipelines with minimal extra code.

Updated Nov 20, 2024
Python

RubixML / Server

A standalone inference server for trained Rubix ML estimators.

api infrastructure php machine-learning microservice json-api rest-api inference http-server inference-server inference-engine model-deployment php-ml ml-infrastructure model-server rubix-ml php-machine-learning rubix-server

Updated Mar 28, 2025
PHP

friendliai / friendli-client

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated Jun 25, 2025
Python

wingman

curtisgray / wingman

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

windows macos linux downloader ai local download gpu chatbot inference openai gpu-acceleration llama inference-server inference-engine gpu-monitoring llm chatgpt llamacpp

Updated Jun 2, 2024
TypeScript

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

inference pytorch text-detection nvidia-docker inference-server tensorrt inference-engine onnx onnx-torch tensorrt-conversion triton-inference-server text-detection-from-image

Updated Aug 18, 2021
Python

Improve this page

Add a description, image, and links to the inference-server topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-server topic, visit your repo's landing page and select "manage topics."