Python: .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures #9902

malandis · 2024-12-07T16:49:41Z

Describe the bug
When using the Python dynamic sessions plugin as a tool for LLMs, the execution response from the API includes an explicit "status" field, but this is not included by the plugin when building the string response to the LLM.

Specifically: while the HTTP POST request status code is OK, code execution response contains an explicit status property in the response body that can be "Failure" (vs "Success").

Because this is missing from the plugin result, the LLM must infer success or failure based on the presence of stderr content. This can lead the LLM to misinterpret certain exceptional scenarios as ongoing or partial success instead of failure. As a result, the LLM may hallucinate follow-up steps, enter unnecessary retry loops, or otherwise produce incorrect responses.

To Reproduce
This is a bit involved since we must set up a tool-calling agent and specifically trigger a code execution failure. That said this scenario reliably produces it for me.

Set up a Python dynamic session and upload a CSV file to the session’s container. For example, create a sample-data.csv file with random data:

import pandas as pd
import numpy as np

N = 100
data = {
    "fiscal_period": np.random.randint(202401, 202412, N),
    "shipping_zipcode": np.random.randint(10000, 99999, N),
    "gross_profit": np.random.uniform(200, 10000, N)
}

df = pd.DataFrame(data)
df.to_csv("sample-data.csv", index=False)

Prompt the LLM through the tool to execute Python code that will intentionally fail, for example by including a deprecated library or code snippet that won’t run:

Create a heat map of gross profit on a map of the United States. Use the file at /mnt/data/sample-data.csv.

Include this snippet of code to load the map details at the beginning:
import matplotlib.pyplot as plt
import geopandas as gpd

# Load US states shapefile
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))

Observe the tool’s execution result. The response includes:

Result:
null
Stdout:
""

Stderr:
"Matplotlib is building the font cache; this may take a moment"

Note that the LLM misinterprets this scenario as ongoing success, repeatedly attempting to continue execution or retry.

Expected behavior
When the interpreter fails, the returned response should include the status field from the response body. With this explicit status, the LLM recognizes the error immediately and will not enter a loop of retries or produce hallucinated follow-ups. Instead, it can gracefully handle the failure scenario.

Screenshots
N/A

Platform

OS: Any
IDE: Any
Language: Any
Source: latest as of filing (1.17.0)

Additional context

I will submit a PR to the dotnet implementation that adds the status property to the returned string. If accepted, I am also willing to submit similar PRs for other languages and integrations.

The text was updated successfully, but these errors were encountered:

malandis · 2024-12-07T16:58:58Z

@markwallace-microsoft / others: this isn't a bug in the Python codebase specifically, rather a bug in the python dynamic sessions plugin which is present in various languages and integrations (eg langchain). I have referenced the .NET implementation in my description.

In all the dynamic sessions plugin implementations, the template to interpolate the code execution API response back to the LLM is identical. Hence they're all affected.

moonbox3 · 2024-12-08T23:39:48Z

Thanks for filing the issue and for working on the fix, @malandis. We'll have the .Net team review your PR soon!

malandis · 2024-12-08T23:42:53Z

@moonbox3 sounds good. Once we align on the problem/fix, I can port to other languages/integrations 👍

moonbox3 · 2024-12-09T00:34:57Z

@moonbox3 sounds good. Once we align on the problem/fix, I can port to other languages/integrations 👍

Looks like you've provided the fix. When I first created the Python plugin, I don't remember there being a "status" property in the result -- I wonder if it was added recently. I've made the similar fix in #9904. Thanks for your help!

### Motivation and Context The Python sessions plugin (ACA) includes a `status` now in their result once code is executed. This looks to have been added recently as we weren't including it in the original return string. This PR adds that.  ### Description Add the `status` key value to the return string. - Fixes Python bug for #9902  ### Contribution Checklist  - [X] The code builds clean without any errors or warnings - [X] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [X] All unit tests pass, and I have added new tests where possible - [X] I didn't break anyone 😄

### Motivation and Context Addresses issue #9902 for .NET. Include the "status" field in the response string returned by Python dynamic sessions. Fixes a usability issue where the LLM assumes success despite execution failures. ### Description As per #9902, including the `status` property from the response in the plugin result, we ensure that the LLM has explicit information about whether the execution succeeded or failed, preventing misinterpretation of stderr or other response elements. This helps avoid hallucinated follow-ups or unnecessary retries by providing clear success/failure indicators. ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: Evan Mattson <[email protected]>

moonbox3 · 2025-01-24T02:40:40Z

Closing this issue as both .Net and Python PRs are in.

malandis added the bug Something isn't working label Dec 7, 2024

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel triage labels Dec 7, 2024

malandis mentioned this issue Dec 7, 2024

.Net: fix: add "status" field to Python dynamic session response #9903

Merged

4 tasks

moonbox3 removed python Pull requests for the Python Semantic Kernel triage labels Dec 8, 2024

malandis changed the title ~~Python: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures~~ .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Dec 8, 2024

moonbox3 changed the title ~~.Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures~~ Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Dec 9, 2024

moonbox3 added the python Pull requests for the Python Semantic Kernel label Dec 9, 2024

moonbox3 added this to Semantic Kernel Dec 9, 2024

moonbox3 mentioned this issue Dec 9, 2024

Python: Include the sessions plugin status key in return value #9904

Merged

4 tasks

alliscode assigned moonbox3 Dec 9, 2024

moonbox3 assigned malandis and unassigned moonbox3 Dec 10, 2024

moonbox3 closed this as completed Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures #9902

Python: .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures #9902

malandis commented Dec 7, 2024 •

edited

Loading

malandis commented Dec 7, 2024 •

edited

Loading

moonbox3 commented Dec 8, 2024

malandis commented Dec 8, 2024

moonbox3 commented Dec 9, 2024

moonbox3 commented Jan 24, 2025

Python: .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures #9902

Python: .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures #9902

Comments

malandis commented Dec 7, 2024 • edited Loading

malandis commented Dec 7, 2024 • edited Loading

moonbox3 commented Dec 8, 2024

malandis commented Dec 8, 2024

moonbox3 commented Dec 9, 2024

moonbox3 commented Jan 24, 2025

malandis commented Dec 7, 2024 •

edited

Loading

malandis commented Dec 7, 2024 •

edited

Loading