Nexa Compute & Nexa Forge

AI Infrastructure Platform with Managed API Service

A complete AI foundry platform for orchestrating data generation, model distillation, training, and evaluation on ephemeral GPU compute.

Quick Start

1. Setup Environment

We use uv for dependency management. The repo is configured to work with the Homebrew install at /opt/homebrew/bin/uv.

# optional: create a project virtualenv
/opt/homebrew/bin/uv venv .venv
source .venv/bin/activate

# install all runtime + dev dependencies deterministically
/opt/homebrew/bin/uv pip sync requirements/requirements-dev.lock

# install git hooks
pre-commit install

2. Training Workflow

Prerequisites: You must bring your own compute from your preferred vendor (AWS, GCP, Lambda, Prime Intellect, etc.).

Spin up a GPU instance (Ubuntu 22.04 recommended).
Copy .env.example to .env and configure your services (WandB, HuggingFace, S3).

Deploy a Training Node: SSH into your node and run the turn-key deployment script. This will sync your code, install dependencies, and set up a persistent workspace.

./nexa_infra/scripts/provision/deploy.sh ubuntu@gpu-node-ip

Start Training (Remote or Local): Once attached to the remote session (tmux), you can start training immediately:

# Run V1 Stability Plan
python nexa_train/train.py --config-mode v1 --run-name my_stability_run

# Run V2 Performance Plan (Distributed)
torchrun --nproc_per_node=8 nexa_train/train.py --config-mode v2 --dry-run true

3. Run Infrastructure

Start Backend & Dashboard:

# Using the orchestrator script
./nexa_infra/scripts/orchestration/start_forge.sh

Project Structure

Nexa_compute/
├── nexa_data/           # Data Engineering (MS/MS, Tool Use, Distillation)
├── nexa_train/          # Training Engine (Axolotl, HF Trainer)
├── nexa_distill/        # Knowledge Distillation Pipeline
├── nexa_eval/           # Evaluation & LLM-as-a-Judge
├── nexa_inference/      # vLLM Serving & Tool Controller
├── nexa_infra/          # IaC (Terraform), Monitoring, Provisioning
├── nexa_ui/             # Dashboards (Streamlit/Next.js)
├── src/
│   └── nexa_compute/
│       ├── api/         # FastAPI backend
│       ├── cli/         # CLI Entrypoint
│       ├── core/        # Core Primitives (DAG, Registry, Artifacts)
│       ├── data/        # DataOps (Versioning, Lineage)
│       ├── models/      # ModelOps (Registry, Versioning)
│       ├── monitoring/  # Observability (Alerts, Metrics, Drift)
│       └── orchestration/ # Workflow Engine (Scheduler, Templates)
├── docs/
│   ├── compute_plans/   # Training Configuration Templates (V1/V2/V3)
│   ├── pipelines/       # Detailed Architecture Docs
│   ├── platform/        # Platform Guide & Best Practices
│   ├── api/             # API Reference
│   └── projects/        # Active Research Projects
├── sdk/                 # Python Client SDK
└── pyproject.toml       # Dependencies & Config

Core Features

Compute Engine

Unified Training CLI: nexa_train/train.py supports flexible overrides and configuration modes (V1 Stability, V2 Performance, V3 Full).
Infrastructure as Code: Terraform modules for AWS GPU clusters.
Observability: Distributed tracing (OpenTelemetry), Prometheus metrics, and real-time cost tracking.
Automated Provisioning: One-command deployment to bare metal or cloud instances with Spot instance support.

Managed API (Nexa Forge)

Workflows: Declarative pipeline orchestration (DAGs) with resume capability.
6 Job Types: Generate, Audit, Distill, Train, Evaluate, Deploy.
Worker Orchestration: Pull-based job queue for ephemeral workers.
Security: SHA256 API keys and metered billing.

MLOps & DataOps

Model Registry: Full lineage tracking from dataset to deployed model.
Data Versioning: Content-addressable storage for datasets.
Monitoring: Automated drift detection and A/B testing framework.

Documentation

For detailed instructions on how the platform works and what each component does, please refer to the documentation:

Documentation Map: Central index for all documentation.
Platform Guide: Overview of platform capabilities.
API Reference: API endpoints and usage.
Infrastructure Guide: Docker, Provisioning, and Hardware.
Training Pipeline: Configuration and Execution.
Data Refinery: MS/MS and Synthetic Data.
Compute Plans: Run Configurations.

Contributing

We welcome contributions! Please review our guidelines before submitting pull requests.

See docs/conventions/ for:

Coding Standards
Data Organization
Naming Conventions

Development Commands

Linting: ruff check .
Testing: pytest tests/
Infrastructure: Validate Terraform with terraform validate.

Tags: machine-learning, distributed-training, infrastructure-as-code, mlops, knowledge-distillation, fastapi, pytorch, spectral-analysis

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.cursor		.cursor
.github		.github
docs		docs
frontend		frontend
nexa_data		nexa_data
nexa_distill		nexa_distill
nexa_eval		nexa_eval
nexa_inference		nexa_inference
nexa_infra		nexa_infra
nexa_tools		nexa_tools
nexa_train		nexa_train
nexa_ui		nexa_ui
requirements		requirements
results		results
scripts		scripts
sdk		sdk
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.mintignore		.mintignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Citaions.py		Citaions.py
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
atlasplusplus_citations.json		atlasplusplus_citations.json
pyproject.toml		pyproject.toml
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nexa Compute & Nexa Forge

Quick Start

1. Setup Environment

2. Training Workflow

3. Run Infrastructure

Project Structure

Core Features

Compute Engine

Managed API (Nexa Forge)

MLOps & DataOps

Documentation

Contributing

Development Commands

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

DarkStarStrix/Nexa_Compute

Folders and files

Latest commit

History

Repository files navigation

Nexa Compute & Nexa Forge

Quick Start

1. Setup Environment

2. Training Workflow

3. Run Infrastructure

Project Structure

Core Features

Compute Engine

Managed API (Nexa Forge)

MLOps & DataOps

Documentation

Contributing

Development Commands

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages