Project to develop an AI system to improve English speaking.
We will use different AI models:
- Speech to Text models (STT) for converting audio to text
- LLM models (Language Models) for generating responses
- Text to speech models (TTS) for converting text to audio
- Research and Test STT models -> I've found DeepSpeech to be the best option for real-time STT.
- Research and Test TTS models
- Implement an agent to generate responses from text using LangChain and ChatGPT API
- Small Streamlit demo to thest the STT model
- Interconnect the agent with the STT and TTS models
- Implement a web interface to interact with the agent -> Streamlit