Skip to content

Commit 5c1722e

Browse files
committed
Step MPS论文添加Coming soon标记
1 parent 6980792 commit 5c1722e

File tree

4 files changed

+5
-5
lines changed

4 files changed

+5
-5
lines changed

content/authors/admin/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,4 +146,4 @@ languages:
146146
awards: []
147147
---
148148

149-
Fei Tian is an Audio LLM Researcher at StepFun, specializing in speech AI. He has been a key contributor to pioneering projects like Step-Audio, Step-Audio 2, Step-MPS, and Step-Audio R1, focusing on speech understanding, interactive systems, and reinforcement learning. Previously, he developed end-to-end conversational models at ByteDance's SEED speech team. Fei is an avid cyclist, swimmer, and rock climber, driven by the desire to contribute his strength to the journey toward Artificial General Intelligence.
149+
Fei Tian is an Audio LLM Researcher at StepFun, pioneering the next generation of speech AI. He was instrumental in developing groundbreaking projects including Step-Audio, Step-Audio 2, Step-Audio R1, and Step-MPS. His work introduced China's leading speech reasoning model (benchmarking Gemini 2.5 Pro), the revolutionary "thinking-while-speaking" framework, and the integration of Chain-of-Thought (CoT) reasoning into the world's first industrial-grade audio LLM. Previously at ByteDance, he spearheaded the architectural evolution of speech models for core products like TikTok and CapCut. Fei is passionately committed to contributing his expertise to the journey toward Artificial General Intelligence.

content/publications/step-audio-r1/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,4 @@ pager: false
2121

2222
## Abstract
2323

24-
Step-Audio R1 represents the Deepseek R1 moment for speech large models, creating China's first leading speech reasoning model with perception and reasoning capabilities that fully match Gemini 2.5 Pro. By integrating our proprietary Step MPS framework, we have achieved a world-first innovation: endowing the model with sophisticated reasoning capabilities and highly human-like interactive intelligence without adding any additional latency, truly realizing zero time gap between thinking and responding.
24+
[Coming soon!]Step-Audio R1 represents the Deepseek R1 moment for speech large models, creating China's first leading speech reasoning model with perception and reasoning capabilities that fully match Gemini 2.5 Pro. By integrating our proprietary Step MPS framework, we have achieved a world-first innovation: endowing the model with sophisticated reasoning capabilities and highly human-like interactive intelligence without adding any additional latency, truly realizing zero time gap between thinking and responding.

content/publications/step-editx/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: 'Step EditX: Next-Generation Conversational Speech Editing Model'
44
authors:
55
- admin
66

7-
date: '2025-10-01T00:00:00Z'
7+
date: '2025-07-15T00:00:00Z'
88

99
featured: false
1010

@@ -21,4 +21,4 @@ pager: false
2121

2222
## Abstract
2323

24-
Step EditX is a groundbreaking next-generation speech editing model that completely transforms traditional tool-based audio post-processing into natural language instruction-based "conversational creation." Users can perform comprehensive intelligent editing of audio—from content to style, from emotion to coloring—simply through text prompts. Step EditX not only possesses powerful zero-shot TTS capabilities, over 14 types of emotion enhancement, and more than 30 style transfer options, but also features precise "one-click audio enhancement" functionality that intelligently repairs various audio imperfections and extracts target voices. Its most significant breakthrough lies in the model's deep understanding of text-level addition, deletion, and modification instructions, enabling context-aware speech regeneration that corrects content while perfectly preserving the speaker's timbre and prosody. This marks the first entry of speech editing into the era of true "semantic-level" intelligent operations.
24+
[Coming soon!]Step EditX is a groundbreaking next-generation speech editing model that completely transforms traditional tool-based audio post-processing into natural language instruction-based "conversational creation." Users can perform comprehensive intelligent editing of audio—from content to style, from emotion to coloring—simply through text prompts. Step EditX not only possesses powerful zero-shot TTS capabilities, over 14 types of emotion enhancement, and more than 30 style transfer options, but also features precise "one-click audio enhancement" functionality that intelligently repairs various audio imperfections and extracts target voices. Its most significant breakthrough lies in the model's deep understanding of text-level addition, deletion, and modification instructions, enabling context-aware speech regeneration that corrects content while perfectly preserving the speaker's timbre and prosody. This marks the first entry of speech editing into the era of true "semantic-level" intelligent operations.

content/publications/step-mps/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,4 @@ pager: false
2121

2222
## Abstract
2323

24-
Step MPS (Mind-Paced Speaking) is a revolutionary brain-inspired proprietary framework designed to endow speech large models with truly human-like abilities to "think while speaking." Its core innovation lies in the "dual-brain" architecture: a "planning brain" responsible for high-level logical reasoning that guides a separate "expression brain" in real-time for fluent speech generation. This collaborative division of labor represents the world's first solution to the fundamental contradiction between complex "chain-of-thought" reasoning and real-time interaction. It achieves zero latency increase while maintaining virtually full reasoning accuracy, thereby granting the model genuine advanced logical intelligence and empathetic interactive capabilities, ultimately achieving seamless synchronization between thinking and expression.
24+
[Coming soon!]Step MPS (Mind-Paced Speaking) is a revolutionary brain-inspired proprietary framework designed to endow speech large models with truly human-like abilities to "think while speaking." Its core innovation lies in the "dual-brain" architecture: a "planning brain" responsible for high-level logical reasoning that guides a separate "expression brain" in real-time for fluent speech generation. This collaborative division of labor represents the world's first solution to the fundamental contradiction between complex "chain-of-thought" reasoning and real-time interaction. It achieves zero latency increase while maintaining virtually full reasoning accuracy, thereby granting the model genuine advanced logical intelligence and empathetic interactive capabilities, ultimately achieving seamless synchronization between thinking and expression.

0 commit comments

Comments
 (0)