You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Doing mitigation evaluations, I saw the agent get confused solving the AssignNonExistentNodeSocialNetMitigation problem. It was able to get the correct answer, but the task_desc is inaccurate which did confused the agent:
"RemediationJSONReportCustomTool NL prompt received: The final answer to the original input question is that the compose-post-service is experiencing connection refused errors when trying to connect to the user-service. After investigating the logs, metrics, and traces of the user-service pod, I was unable to find any issues. Therefore, I recommend scaling up the user-service deployment to resolve the issue. The remediation plan involves either scaling up using existing resources, adding new resources, or using automated scaling.
RemediationJSONReportCustomTool function arguments identified are: {
"remediation":[
[
{ "action" : "Scale up user-service deployment using existing resources" },
{ "action" : "Add new resources to user-service deployment" },
{ "action" : "Implement automated scaling for user-service deployment" }
]
]"
Here is the task description in the code:
################## Mitigation Problem ##################classAssignNonExistentNodeSocialNetMitigation(
AssignNonExistentNodeSocialNetBaseTask, MitigationTask
):
def__init__(self):
AssignNonExistentNodeSocialNetBaseTask.__init__(self)
MitigationTask.__init__(self, self.app)
self.task_desc+="Start by investigating the `compost-post-service` pod"
I personally think that we should remove these, as they can either mislead the agent or give too easy of a hint. In practice, the agent would never know where the fault is or receive a "hint", so I don't see the purpose of having these included.
The text was updated successfully, but these errors were encountered:
Agree. The initial idea for this prompt was to give agent a starting point; but I agree that it can confuse the agent, and the agent can start by itself. Feel free to modify it and test.
@yinfangchen I think it's bad to include in the benchmark, we're essentially helping the agent cheat in some cases. For the next round of evaluations we should have it removed.
Doing mitigation evaluations, I saw the agent get confused solving the AssignNonExistentNodeSocialNetMitigation problem. It was able to get the correct answer, but the task_desc is inaccurate which did confused the agent:
"RemediationJSONReportCustomTool NL prompt received: The final answer to the original input question is that the
compose-post-service
is experiencing connection refused errors when trying to connect to theuser-service
. After investigating the logs, metrics, and traces of theuser-service
pod, I was unable to find any issues. Therefore, I recommend scaling up theuser-service
deployment to resolve the issue. The remediation plan involves either scaling up using existing resources, adding new resources, or using automated scaling.RemediationJSONReportCustomTool function arguments identified are: {
"remediation":[
[
{ "action" : "Scale up user-service deployment using existing resources" },
{ "action" : "Add new resources to user-service deployment" },
{ "action" : "Implement automated scaling for user-service deployment" }
]
]"
Here is the task description in the code:
I personally think that we should remove these, as they can either mislead the agent or give too easy of a hint. In practice, the agent would never know where the fault is or receive a "hint", so I don't see the purpose of having these included.
The text was updated successfully, but these errors were encountered: