-
Notifications
You must be signed in to change notification settings - Fork 854
Description
π The feature, motivation and pitch
Motivation
extension/asr/runner/ currently provides AsrRunner, which only supports Seq2Seq (encoder-decoder) models like Whisper. The decode loop assumes a standard autoregressive pattern: encoder β text_decoder(input_ids, encoder_output, cache_position) β logits β sample β next_token.
Transducer-based ASR models (RNN-T, TDT, HAT) use a fundamentally different decode paradigm β frame-by-frame scanning with a joint network β and cannot reuse AsrRunner. As a result, the Parakeet TDT runner (examples/models/parakeet/main.cpp) implements the entire decode algorithm inline (~200 lines of greedy decode + LSTM state management), making it hard to reuse for other transducer models.
Proposal
Restructure extension/asr/runner/ to support both architectures:
- Rename
AsrRunnerβSeq2SeqRunnerto clarify that it's Seq2Seq-specific - Add
TransducerRunnerfor RNN-T/TDT models, extracting the core decode logic from Parakeet'smain.cpp - Keep both in the same flat directory (no subdirectories)
Proposed file layout
extension/asr/runner/
βββ CMakeLists.txt
βββ seq2seq_runner.h # renamed from runner.h
βββ seq2seq_runner.cpp # renamed from runner.cpp
βββ transducer_runner.h # new
βββ transducer_runner.cpp # new
TransducerRunner sketch
namespace executorch::extension::asr {
struct TransducerConfig {
int64_t blank_id = 0;
int64_t num_rnn_layers = 2;
int64_t pred_hidden = 640;
int64_t max_symbols_per_step = 10;
// TDT duration values; empty = standard RNN-T (duration always 1)
std::vector<int> durations = {};
};
class TransducerRunner {
public:
TransducerRunner(
const std::string& module_path,
const std::string& tokenizer_path,
TransducerConfig config);
Error load();
// Returns decoded token IDs with frame offsets
Result<std::vector<Token>> transcribe(
TensorPtr preprocessed_features,
std::function<void(const std::string&)> token_callback = {});
};
} // namespace executorch::extension::asrExpected module methods: encoder, decoder_step, joint (+ optional preprocessor).
What stays in examples/models/parakeet/
Model-specific post-processing (timestamp computation at token/word/segment level) remains in the example β it's not general enough for a shared runner.
Migration
- Whisper
main.cpp:AsrRunnerβSeq2SeqRunner(one-line rename) - Parakeet
main.cpp: replace inline decode withTransducerRunner::transcribe() - Downstream consumers of
AsrRunner: update include path and class name
Alternatives
No response
Additional context
No response
RFC (Optional)
No response
cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng
Metadata
Metadata
Assignees
Labels
Type
Projects
Status