Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures #9902

Closed
malandis opened this issue Dec 7, 2024 · 5 comments
Assignees
Labels
bug Something isn't working .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel

Comments

@malandis
Copy link
Contributor

malandis commented Dec 7, 2024

Describe the bug
When using the Python dynamic sessions plugin as a tool for LLMs, the execution response from the API includes an explicit "status" field, but this is not included by the plugin when building the string response to the LLM.

Specifically: while the HTTP POST request status code is OK, code execution response contains an explicit status property in the response body that can be "Failure" (vs "Success").

Because this is missing from the plugin result, the LLM must infer success or failure based on the presence of stderr content. This can lead the LLM to misinterpret certain exceptional scenarios as ongoing or partial success instead of failure. As a result, the LLM may hallucinate follow-up steps, enter unnecessary retry loops, or otherwise produce incorrect responses.

To Reproduce
This is a bit involved since we must set up a tool-calling agent and specifically trigger a code execution failure. That said this scenario reliably produces it for me.

Set up a Python dynamic session and upload a CSV file to the session’s container. For example, create a sample-data.csv file with random data:

import pandas as pd
import numpy as np

N = 100
data = {
    "fiscal_period": np.random.randint(202401, 202412, N),
    "shipping_zipcode": np.random.randint(10000, 99999, N),
    "gross_profit": np.random.uniform(200, 10000, N)
}

df = pd.DataFrame(data)
df.to_csv("sample-data.csv", index=False)

Prompt the LLM through the tool to execute Python code that will intentionally fail, for example by including a deprecated library or code snippet that won’t run:

Create a heat map of gross profit on a map of the United States. Use the file at /mnt/data/sample-data.csv.

Include this snippet of code to load the map details at the beginning:
import matplotlib.pyplot as plt
import geopandas as gpd

# Load US states shapefile
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))

Observe the tool’s execution result. The response includes:

Result:
null
Stdout:
""

Stderr:
"Matplotlib is building the font cache; this may take a moment"

Note that the LLM misinterprets this scenario as ongoing success, repeatedly attempting to continue execution or retry.

Expected behavior
When the interpreter fails, the returned response should include the status field from the response body. With this explicit status, the LLM recognizes the error immediately and will not enter a loop of retries or produce hallucinated follow-ups. Instead, it can gracefully handle the failure scenario.

Screenshots
N/A

Platform

  • OS: Any
  • IDE: Any
  • Language: Any
  • Source: latest as of filing (1.17.0)

Additional context

  • I will submit a PR to the dotnet implementation that adds the status property to the returned string. If accepted, I am also willing to submit similar PRs for other languages and integrations.
@malandis malandis added the bug Something isn't working label Dec 7, 2024
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel triage labels Dec 7, 2024
@github-actions github-actions bot changed the title Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Dec 7, 2024
@github-actions github-actions bot changed the title .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Python: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Dec 7, 2024
@malandis
Copy link
Contributor Author

malandis commented Dec 7, 2024

@markwallace-microsoft / others: this isn't a bug in the Python codebase specifically, rather a bug in the python dynamic sessions plugin which is present in various languages and integrations (eg langchain). I have referenced the .NET implementation in my description.

In all the dynamic sessions plugin implementations, the template to interpolate the code execution API response back to the LLM is identical. Hence they're all affected.

@moonbox3 moonbox3 removed python Pull requests for the Python Semantic Kernel triage labels Dec 8, 2024
@moonbox3
Copy link
Contributor

moonbox3 commented Dec 8, 2024

Thanks for filing the issue and for working on the fix, @malandis. We'll have the .Net team review your PR soon!

@malandis malandis changed the title Python: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Dec 8, 2024
@malandis
Copy link
Contributor Author

malandis commented Dec 8, 2024

@moonbox3 sounds good. Once we align on the problem/fix, I can port to other languages/integrations 👍

@moonbox3 moonbox3 changed the title .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Dec 9, 2024
@moonbox3 moonbox3 added the python Pull requests for the Python Semantic Kernel label Dec 9, 2024
@github-actions github-actions bot changed the title Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Python: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Dec 9, 2024
@moonbox3
Copy link
Contributor

moonbox3 commented Dec 9, 2024

@moonbox3 sounds good. Once we align on the problem/fix, I can port to other languages/integrations 👍

Looks like you've provided the fix. When I first created the Python plugin, I don't remember there being a "status" property in the result -- I wonder if it was added recently. I've made the similar fix in #9904. Thanks for your help!

@moonbox3 moonbox3 changed the title Python: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Python: .Net: Bug: Missing "status" field in Python dynamic session plugin responses causes LLM misinterpretation of execution failures Dec 9, 2024
github-merge-queue bot pushed a commit that referenced this issue Dec 9, 2024
### Motivation and Context

The Python sessions plugin (ACA) includes a `status` now in their result
once code is executed. This looks to have been added recently as we
weren't including it in the original return string. This PR adds that.

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

### Description

Add the `status` key value to the return string.
- Fixes Python bug for #9902 

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [X] The code builds clean without any errors or warnings
- [X] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [X] All unit tests pass, and I have added new tests where possible
- [X] I didn't break anyone 😄
@moonbox3 moonbox3 assigned malandis and unassigned moonbox3 Dec 10, 2024
github-merge-queue bot pushed a commit that referenced this issue Dec 12, 2024
### Motivation and Context

Addresses issue #9902 for .NET.

Include the "status" field in the response string returned by Python
dynamic sessions. Fixes a usability issue where the LLM assumes success
despite execution failures.


### Description

As per #9902, including the `status` property from the response in the
plugin result, we ensure that the LLM has explicit information about
whether the execution succeeded or failed, preventing misinterpretation
of stderr or other response elements.

This helps avoid hallucinated follow-ups or unnecessary retries by
providing clear success/failure indicators.


### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

---------

Co-authored-by: Evan Mattson <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Dec 12, 2024
### Motivation and Context

Addresses issue #9902 for .NET.

Include the "status" field in the response string returned by Python
dynamic sessions. Fixes a usability issue where the LLM assumes success
despite execution failures.


### Description

As per #9902, including the `status` property from the response in the
plugin result, we ensure that the LLM has explicit information about
whether the execution succeeded or failed, preventing misinterpretation
of stderr or other response elements.

This helps avoid hallucinated follow-ups or unnecessary retries by
providing clear success/failure indicators.


### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

---------

Co-authored-by: Evan Mattson <[email protected]>
@moonbox3
Copy link
Contributor

Closing this issue as both .Net and Python PRs are in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel
Projects
Status: No status
Development

No branches or pull requests

3 participants