Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task_desc can mislead agent. #35

Open
HacksonClark opened this issue Mar 20, 2025 · 4 comments
Open

task_desc can mislead agent. #35

HacksonClark opened this issue Mar 20, 2025 · 4 comments
Assignees

Comments

@HacksonClark
Copy link
Collaborator

Doing mitigation evaluations, I saw the agent get confused solving the AssignNonExistentNodeSocialNetMitigation problem. It was able to get the correct answer, but the task_desc is inaccurate which did confused the agent:

"RemediationJSONReportCustomTool NL prompt received: The final answer to the original input question is that the compose-post-service is experiencing connection refused errors when trying to connect to the user-service. After investigating the logs, metrics, and traces of the user-service pod, I was unable to find any issues. Therefore, I recommend scaling up the user-service deployment to resolve the issue. The remediation plan involves either scaling up using existing resources, adding new resources, or using automated scaling.
RemediationJSONReportCustomTool function arguments identified are: {
"remediation":[
[
{ "action" : "Scale up user-service deployment using existing resources" },
{ "action" : "Add new resources to user-service deployment" },
{ "action" : "Implement automated scaling for user-service deployment" }
]
]"

Here is the task description in the code:

################## Mitigation Problem ##################
class AssignNonExistentNodeSocialNetMitigation(
    AssignNonExistentNodeSocialNetBaseTask, MitigationTask
):
    def __init__(self):
        AssignNonExistentNodeSocialNetBaseTask.__init__(self)
        MitigationTask.__init__(self, self.app)
        self.task_desc += "Start by investigating the `compost-post-service` pod"

I personally think that we should remove these, as they can either mislead the agent or give too easy of a hint. In practice, the agent would never know where the fault is or receive a "hint", so I don't see the purpose of having these included.

@HacksonClark HacksonClark self-assigned this Mar 20, 2025
@HacksonClark
Copy link
Collaborator Author

@yinfangchen Let me know your thoughts.

@yinfangchen
Copy link
Member

Agree. The initial idea for this prompt was to give agent a starting point; but I agree that it can confuse the agent, and the agent can start by itself. Feel free to modify it and test.

@HacksonClark
Copy link
Collaborator Author

@yinfangchen I think it's bad to include in the benchmark, we're essentially helping the agent cheat in some cases. For the next round of evaluations we should have it removed.

@yinfangchen
Copy link
Member

Yes, agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants