feat(backend): add SpiritLM backend for text, TTS, and ASR#8589
feat(backend): add SpiritLM backend for text, TTS, and ASR#8589MkDev11 wants to merge 11 commits intomudler:masterfrom
Conversation
Implements LocalAI backend for Meta Spirit LM (interleaved text and speech). - backend/python/spiritlm: gRPC servicer with LoadModel, Predict, PredictStream, TTS, TTSStream, AudioTranscription, Health - Supports spirit-lm-base-7b and spirit-lm-expressive-7b - Options: sample_rate (default 16000) - backend/index.yaml: add spiritlm meta and capabilities Ref: mudler#3966 Signed-off-by: mkdev11 <MkDev11@users.noreply.github.com>
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
@mudler could you please review the PR and let me know your feedback? |
|
Can you add an entry in gallery/index.yaml similar to Qwen-ASR? |
Add spirit-lm-base-7b and spirit-lm-expressive-7b to model gallery, following the same pattern as Qwen-ASR (per PR mudler#8589 review). Signed-off-by: mkdev11 <MkDev11@users.noreply.github.com>
@richiejp Thanks for your feedback. I added SpiritLM to |
|
Thanks, the bottleneck on our end is testing. If you provide e2e tests then we can verify these and get it merged faster. |
@richiejp I added e2e tests, please review the update again and let me know your feedback. |
|
you still need to add e2e tests |
- Add 'SpiritLM backend e2e' context in core/http/app_test.go (label: spiritlm) with specs: chat completion, TTS, transcription; skip when backend/model not ready - Add make test-spiritlm target; pass SPIRITLM_CHECKPOINTS_DIR when set - Add backend/python/spiritlm/E2E.md with run instructions and full-pass steps - Fix protogen.sh to use repo backend proto path; add backend/python/backend.proto symlink for runProtogen; run.sh executable Ref: mudler#8589, mudler#3966
sorry, just pushed e2e tests |
- Add 'SpiritLM backend e2e' context in core/http/app_test.go (label: spiritlm) with specs: chat completion, TTS, transcription; skip when backend/model not ready - Add make test-spiritlm target; pass SPIRITLM_CHECKPOINTS_DIR when set - Add backend/python/spiritlm/E2E.md with run instructions and full-pass steps - Fix protogen.sh to use repo backend proto path; add backend/python/backend.proto symlink for runProtogen; run.sh executable Ref: mudler#8589, mudler#3966 Signed-off-by: mkdev11 <MkDev11@users.noreply.github.com>
- Add tests/e2e/spiritlm_e2e_test.go with chat, TTS, and transcription specs - Register spiritlm mock backend and spirit-lm-base-7b model in e2e_suite_test.go - Add make test-e2e-spiritlm target; fix protogen-go PATH for protoc plugins - Update backend/python/spiritlm/E2E.md with tests/e2e coverage and run instructions Signed-off-by: mkdev11 <MkDev11@users.noreply.github.com>
87ef862 to
bdc4829
Compare
|
Hi @richiejp Can you please let me know what I need to update or change? |
|
@mudler could you please review the PR and let me know your feedback? thanks |
|
Hello! Thank you for your contribution. Before we can proceed with this PR, we need to have a discussion about the proposed changes. LocalAI doesn't support cloud-based backends, and there was no issue opened for this feature prior to the PR. We encourage contributors to first open an issue to discuss their proposed changes with the maintainers. Could you please open an issue describing your use case and proposed implementation? Once we've had a chance to discuss and align on the direction, we can then continue reviewing this PR. Thank you for your understanding! |
@localai-bot This PR fixes the issue - #3966 |
|
@mudler Thanks for your reply on the issue, could you let me know the next steps? To be honest, I don't want to close this PR. If this is no longer needed, can I implement another issue on the same PR after remove the implementation? |
|
@mudler please review the PR and let me know your feedback. thanks. |
Description
Fixes #3966
Adds a new LocalAI backend for Meta Spirit LM: an interleaved text and speech model that supports text generation, text-to-speech (TTS), and automatic speech recognition (ASR) in a single 7B model.
Changes:
LoadModel: loadsspirit-lm-base-7borspirit-lm-expressive-7bPredict/PredictStream: text generation viaOutputModality.TEXTTTS/TTSStream: text → speech viaOutputModality.SPEECH(float32 → 16 kHz WAV)AudioTranscription: speech → text viaOutputModality.TEXTfrom audio path (request.dst)Health, options parsing (sample_rate, etc.)&spiritlmmeta with description, tags (text-to-text, TTS, ASR, LLM, multimodal), capabilities (cpu-spiritlm, cuda12-spiritlm).Notes for Reviewers
requirements-install.txtinstalls fromgit+https://github.com/facebookresearch/spiritlm.git. Checkpoints must be set up separately per the SpiritLM repo.backend.protointo the backend dir per existing Dockerfile.python.fair-noncommercial(Meta FAIR Noncommercial Research License).Signed commits