Python: Handling Rate Limits and Potential Code Interpreter Limitations in Azure Assistant Agent #10287

anu43 · 2025-01-24T08:36:04Z

We're encountering challenges when attempting to run more complex ML/DL algorithms on the Titanic dataset using an Azure Assistant Agent. It's unclear whether this is due to code interpreter limitations or our implementation.

Current Behavior:

Basic analyses and initial ML model training work successfully.
We encounter a rate limit error when attempting to improve model accuracy beyond 85%.

Error Message:

semantic_kernel.exceptions.agent_exceptions.AgentInvokeException: Run failed with status: `failed` for agent `data-scientist` and thread `thread_xxxxxxxxxx` with error: Rate limit is exceeded. Try again in 22 seconds.

Relevant Code Snippet:

agent = await AzureAssistantAgent.create(
        kernel=Kernel(),
        service_id="agent",
        name="data-scientist",
        instructions=DS_SYS_PROMPT,
        enable_code_interpreter=True,
        code_interpreter_filenames=[DATA_PATH],
    )

    print("Creating thread... ", end="")
    thread_id = await agent.create_thread()
    print(thread_id)

    try:
        is_complete: bool = False
        file_ids: list[str] = []
        while not is_complete:
            user_input = input("\nUser:> ")
            if not user_input:
                continue

            if user_input.lower() == "exit":
                is_complete = True

            await agent.add_chat_message(
                thread_id=thread_id,
                message=ChatMessageContent(role=AuthorRole.USER, content=user_input),
            )
            is_code: bool = False
            async for response in agent.invoke(thread_id=thread_id):
                if is_code != response.metadata.get("code"):
                    print()
                    is_code = not is_code

                print(f"{response.content}", end="")

                file_ids.extend(
                    [
                        item.file_id
                        for item in response.items
                        if isinstance(item, StreamingFileReferenceContent)
                    ]
                )

            print()

            await download_response_image(agent, file_ids)
            file_ids.clear()

    finally:
        # Clean up agents
        print("Cleaning up resources...")
        if agent is not None:
            await _clean_up_resources(agent=agent, thread_id=thread_id)

Questions:

Is this a limitation of the code interpreter, or could it be related to our implementation?
Are there best practices for optimizing code execution within the Azure Assistant Agent to avoid rate limits?
How can we implement a wait mechanism to respect the rate limit (e.g., waiting 22 seconds before retrying)?
Are there any built-in retry mechanisms or rate limit handling features in the Azure Assistant Agent that we should be using?
Should more complex ML tasks be broken down into smaller, sequential requests to the agent?

Desired Outcome:
We aim to understand the source of this limitation and find ways to handle rate limits effectively, allowing us to perform more complex ML tasks without errors. Additionally, we seek guidance on best practices for working with the Azure Assistant Agent for computationally intensive tasks.

Any insights, suggestions, or examples of addressing these issues would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

moonbox3 · 2025-01-25T03:10:08Z

Hi @anu43, we allow one to provide overrides for the RunPollingOptions which are used by the AzureAssistantAgent. The run polling options consist of:

@experimental_class
class RunPollingOptions(KernelBaseModel):
    """Configuration and defaults associated with polling behavior for Assistant API requests."""

    default_polling_interval: timedelta = Field(default=timedelta(milliseconds=250))
    default_polling_backoff: timedelta = Field(default=timedelta(seconds=1))
    default_polling_backoff_threshold: int = Field(default=2)
    default_message_synchronization_delay: timedelta = Field(default=timedelta(milliseconds=250))
    run_polling_interval: timedelta = Field(default=timedelta(milliseconds=250))
    run_polling_backoff: timedelta = Field(default=timedelta(seconds=1))
    run_polling_backoff_threshold: int = Field(default=2)
    message_synchronization_delay: timedelta = Field(default=timedelta(milliseconds=250))
    run_polling_timeout: timedelta = Field(default=timedelta(minutes=1))  # New timeout attribute

See the class definition here.

You could do something like:

from semantic_kernel.agents.open_ai.run_polling_options import RunPollingOptions
from datetime import timedelta

polling_options = RunPollingOptions(run_polling_interval=timedelta(seconds=5)) # or something based on your RPM

# Create the agent configuration
agent = await AzureAssistantAgent.create(
    kernel=kernel,
    service_id=service_id,
    name=AGENT_NAME,
    instructions=AGENT_INSTRUCTIONS,
    ...,
    polling_options=polling_options,
)

The attributes you'll want to pay attention to are:

run_polling_backoff, run_polling_interval and run_polling_backoff_threshold

We use these based on:

def get_polling_interval(self, iteration_count: int) -> timedelta:
    """Get the polling interval for the given iteration count."""
    return (
        self.run_polling_backoff
        if iteration_count > self.run_polling_backoff_threshold
        else self.run_polling_interval
    )

Additionally, in your AI Foundry Portal, you can adjust your RPM/TPM for your model deployment. Could you have a look at if you can increase your RPM?

moonbox3 · 2025-01-26T00:30:08Z

I should add: yes, we can do better at handling rate limits for the caller -- a feature we should explore in the future. But hopefully my suggestion above can help mitigate your current 429s.

markwallace-microsoft added python Pull requests for the Python Semantic Kernel triage labels Jan 24, 2025

moonbox3 self-assigned this Jan 25, 2025

moonbox3 added agents and removed triage labels Jan 25, 2025

moonbox3 added this to Semantic Kernel Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Handling Rate Limits and Potential Code Interpreter Limitations in Azure Assistant Agent #10287

Python: Handling Rate Limits and Potential Code Interpreter Limitations in Azure Assistant Agent #10287

anu43 commented Jan 24, 2025 •

edited

Loading

moonbox3 commented Jan 25, 2025

moonbox3 commented Jan 26, 2025

Python: Handling Rate Limits and Potential Code Interpreter Limitations in Azure Assistant Agent #10287

Python: Handling Rate Limits and Potential Code Interpreter Limitations in Azure Assistant Agent #10287

Comments

anu43 commented Jan 24, 2025 • edited Loading

moonbox3 commented Jan 25, 2025

moonbox3 commented Jan 26, 2025

anu43 commented Jan 24, 2025 •

edited

Loading