Skip to content

Conversation

hengtaoguo
Copy link
Collaborator

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

  • why is this change being made,
  • the problem being solved and any relevant context,
  • why this is a good solution,
  • some information about the specific implementation,
  • shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

NicoGrande and others added 30 commits September 25, 2025 18:50
PiperOrigin-RevId: 810134577
update

refactor

update

Add optional config

Explicitly shard input tensors across mesh devices

Run on 0.7.2 candidate image

Fix typo in image tag

Revert to use latest tag

update test for new jax version

Remove sharding rules for q_lora and kv_lora from base.yml

update with configs

clean up

update
Removed tunix from requirements files

Install tunix if device=tpu

Mark tunix-based tests as tpu-only

Ignore sft hooks test for gpu tests
This commit introduces a fully-featured, OpenAI-compatible RESTful API server for serving MaxText models. The server is built with FastAPI, supports multi-host inference on TPUs, and is designed for both interactive use and large-scale benchmarking.

Key features and additions:

1.  **Core Server Implementation:**
    - Adds `maxtext_server.py`, a FastAPI application that serves `/v1/completions` and `/v1/chat/completions` endpoints.
    - Implements dynamic request batching to efficiently utilize underlying hardware.
    - Uses `maxtext_generator.py` to encapsulate the MaxText inference engine, handling model loading, tokenization, and the generation loop.
    - Includes Pydantic models in `server_models.py` for robust, OpenAI-compliant request and response validation.

2.  **Deployment and Utilities:**
    - Provides `start_server.sh` to simplify launching the server from the project root.
    - Adds `port_forward_xpk.sh`, a utility script to automatically find and connect to a server running on a GKE cluster via `xpk`, supporting custom namespaces.
    - Isolates server-specific dependencies in `benchmarks/api_server/requirements.txt` (`uvicorn`, `fastapi`, `openai-harmony`).

3.  **Comprehensive Documentation:**
    - A new `README.md` in the `api_server` directory offers a complete guide covering:
      - Installation and environment setup.
      - Launching the server in both single-pod and multi-pod GKE environments.
      - Detailed examples for interacting with the API using `curl` and the `openai` Python client.
      - Step-by-step instructions for running benchmarks with `lm-evaluation-harness` and `evalchemy` for both log-likelihood and generative tasks.
NicoGrande and others added 7 commits October 3, 2025 02:18
Revert "fix setup.sh for MODE=nightly"

This reverts commit 91dcc69.

Fix bugs in uploading safetensors to GCS

fix setup.sh for MODE=nightly

fix readme

Migrate GemmaDecoderLayer and Gemma2DecoderLayer to NNX

Disable setting specific profiler options on Pathways backend.

PiperOrigin-RevId: 815770690

fix preprocessor

Llama4Vision NNX

clean up

remove unused import
@hengtaoguo hengtaoguo changed the title [exp] Hengtaoguo nnx copy [exp] nnx copy Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.