-
Notifications
You must be signed in to change notification settings - Fork 2.9k
add modality-aware Instructions with audio/text variants #4987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
377f9bd
add Instructions class with audio/text modality-aware variants
longcw 656dcfb
rename
longcw 31ff50b
add example
longcw 503b739
fix shadow copy issue
longcw a1f63ac
skip apply_instructions_modality if no Instructions
longcw c7345d2
use multiline strings
longcw c242bbd
rename to as_modality
longcw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| import logging | ||
| from datetime import datetime | ||
|
|
||
| from dotenv import load_dotenv | ||
|
|
||
| from livekit.agents import ( | ||
| Agent, | ||
| AgentServer, | ||
| AgentSession, | ||
| JobContext, | ||
| JobProcess, | ||
| cli, | ||
| function_tool, | ||
| inference, | ||
| ) | ||
| from livekit.agents.llm import Instructions | ||
| from livekit.plugins import silero | ||
|
|
||
| logger = logging.getLogger("instructions-per-modality") | ||
|
|
||
| load_dotenv() | ||
|
|
||
| BASE_INSTRUCTIONS = """\\ | ||
| You are a scheduling assistant named Alex that helps users book appointments. | ||
| {modality_specific} | ||
| Call `book_appointment` to finalise the booking. | ||
| Never invent or assume details the user did not provide — ask for them instead. | ||
| The current date is {current_date}. | ||
| """ | ||
|
|
||
| # Voice users speak in approximate, self-correcting natural language. | ||
| # The LLM needs guidance on how to parse what was said, not how to say things back. | ||
| AUDIO_SPECIFIC = """ | ||
| The user is speaking — their input arrives as voice transcription and may be imperfect. | ||
| When interpreting what the user said: | ||
| - Resolve relative spoken expressions to a concrete date/time: 'next Tuesday', 'tomorrow afternoon', 'the week after next around 3'. | ||
| - Spoken numbers may be ambiguous: 'three thirty' could mean 3:30 PM or the 30th of March — ask for clarification when context does not make it obvious. | ||
| - Honor verbal self-corrections: if the user says 'wait, I meant Thursday not Tuesday', update your understanding to Thursday and discard Tuesday. | ||
| - Ignore filler words and hesitations ('um', 'uh', 'like', 'I guess'). | ||
| - Always confirm the resolved date and time out loud before booking, since spoken input is inherently ambiguous. | ||
| """ | ||
|
|
||
| # Text users type precise values — no need to normalise spoken patterns. | ||
| TEXT_SPECIFIC = """ | ||
| The user is typing — take their input literally. | ||
| When interpreting what the user wrote: | ||
| - Accept exact dates and times in any common format (ISO, natural language, 12-hour or 24-hour clock). | ||
| - If the user provides a complete and unambiguous date and time, you may book immediately without asking for confirmation. | ||
| - Only ask follow-up questions for genuinely missing information. | ||
| """ | ||
|
|
||
|
|
||
| class SchedulingAgent(Agent): | ||
| def __init__(self) -> None: | ||
| current_date = datetime.now().strftime("%Y-%m-%d %A") | ||
| super().__init__( | ||
| instructions=Instructions( | ||
| audio=BASE_INSTRUCTIONS.format( | ||
| modality_specific=AUDIO_SPECIFIC, current_date=current_date | ||
| ), | ||
| text=BASE_INSTRUCTIONS.format( | ||
| modality_specific=TEXT_SPECIFIC, current_date=current_date | ||
| ), | ||
| ) | ||
| ) | ||
|
|
||
| async def on_enter(self) -> None: | ||
| self.session.generate_reply() | ||
|
|
||
| @function_tool | ||
| async def book_appointment(self, date: str, time: str) -> None: | ||
| """Book an appointment. | ||
|
|
||
| Args: | ||
| date: The date of the appointment in the format YYYY-MM-DD | ||
| time: The time of the appointment in the format HH:MM | ||
| """ | ||
| logger.info(f"booking appointment for {date} at {time}") | ||
| return f"Appointment booked for {date} at {time}" | ||
|
|
||
|
|
||
| server = AgentServer() | ||
|
|
||
|
|
||
| def prewarm(proc: JobProcess) -> None: | ||
| proc.userdata["vad"] = silero.VAD.load() | ||
|
|
||
|
|
||
| server.setup_fnc = prewarm | ||
|
|
||
|
|
||
| @server.rtc_session() | ||
| async def entrypoint(ctx: JobContext) -> None: | ||
| session = AgentSession( | ||
| stt=inference.STT("deepgram/nova-3"), | ||
| llm=inference.LLM("openai/gpt-4.1-mini"), | ||
| tts=inference.TTS("cartesia/sonic-3"), | ||
| vad=ctx.proc.userdata["vad"], | ||
| ) | ||
|
|
||
| await session.start(agent=SchedulingAgent(), room=ctx.room) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| cli.run_app(server) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴
book_appointmentreturn type annotation isNonebut function returns a stringThe
book_appointmentmethod at line 71 declares-> Nonebut actually returnsf"Appointment booked for {date} at {time}"at line 79. Because thefunction_tooldecorator inspects the return type annotation to decide how to handle tool output, a-> Noneannotation may cause the framework to discard the return value, meaning the LLM never receives the booking confirmation string. This would make the agent unable to confirm to the user that the booking succeeded.Was this helpful? React with 👍 or 👎 to provide feedback.