Support interactive commands (All-Hands-AI#3653)

* hacky solution for interactive commands * add more behavior * debug * fix continue functionality * remove prints * refactor a bit * reduce test sleep * fix python version * fix pre-commit issue * Regenerate integration tests * Update openhands/runtime/client/client.py * revert some prompt stuff * several integration mock files regenerated * execute_action: remove duplicate exception logging --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: tobitege <[email protected]>
jaki300 · Sep 8, 2024 · ab38515 · ab38515
1 parent 5100d12
commit ab38515
Show file tree

Hide file tree

Showing 69 changed files with 3,124 additions and 315 deletions.
diff --git a/.github/workflows/regenerate_integration_tests.yml b/.github/workflows/regenerate_integration_tests.yml
@@ -59,5 +59,6 @@ jobs:
         git config --global user.name 'github-actions[bot]'
         git config --global user.email 'github-actions[bot]@users.noreply.github.com'
         git add .
-        git commit -m "Regenerate integration tests"
+        # run it twice in case pre-commit makes changes
+        git commit -am "Regenerate integration tests" || git commit -am "Regenerate integration tests"
         git push
diff --git a/agenthub/codeact_agent/system_prompt.j2 b/agenthub/codeact_agent/system_prompt.j2
@@ -5,8 +5,13 @@ The assistant can use a Python environment with <execute_ipython>, e.g.:
 print("Hello World!")
 </execute_ipython>
 The assistant can execute bash commands wrapped with <execute_bash>, e.g. <execute_bash> ls </execute_bash>.
-The assistant is not allowed to run interactive commands. For commands that may run indefinitely,
-the output should be redirected to a file and the command run in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
+If a bash command returns exit code `-1`, this means the process is not yet finished.
+The assistant must then send a second <execute_bash>. The second <execute_bash> can be empty
+(which will retrieve any additional logs), or it can contain text to be sent to STDIN of the running process,
+or it can contain the text `ctrl+c` to interrupt the process.
+
+For commands that may run indefinitely, the output should be redirected to a file and the command run
+in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
 If a command execution result says "Command timed out. Sending SIGINT to the process",
 the assistant should retry running the command in the background.
 {% endset %}

diff --git a/openhands/runtime/client/client.py b/openhands/runtime/client/client.py
@@ -60,6 +60,7 @@ class ActionRequest(BaseModel):
 INIT_COMMANDS = [
     'git config --global user.name "openhands" && git config --global user.email "[email protected]" && alias git="git --no-pager"',
 ]
+SOFT_TIMEOUT_SECONDS = 5
 
 
 class RuntimeClient:
@@ -212,6 +213,9 @@ def _get_bash_prompt_and_update_pwd(self):
         if ps1 == pexpect.EOF:
             logger.error(f'Bash shell EOF! {self.shell.after=}, {self.shell.before=}')
             raise RuntimeError('Bash shell EOF')
+        if ps1 == pexpect.TIMEOUT:
+            logger.warning('Bash shell timeout')
+            return ''
 
         # begin at the last occurrence of '[PEXPECT_BEGIN]'.
         # In multi-line bash commands, the prompt will be repeated
@@ -243,39 +247,56 @@ def _execute_bash(
         command: str,
         timeout: int | None,
         keep_prompt: bool = True,
+        kill_on_timeout: bool = True,
     ) -> tuple[str, int]:
         logger.debug(f'Executing command: {command}')
+        self.shell.sendline(command)
+        return self._continue_bash(
+            timeout=timeout, keep_prompt=keep_prompt, kill_on_timeout=kill_on_timeout
+        )
+
+    def _interrupt_bash(self, timeout: int | None = None) -> tuple[str, int]:
+        self.shell.sendintr()  # send SIGINT to the shell
+        self.shell.expect(self.__bash_expect_regex, timeout=timeout)
+        output = self.shell.before
+        exit_code = 130  # SIGINT
+        return output, exit_code
+
+    def _continue_bash(
+        self,
+        timeout: int | None,
+        keep_prompt: bool = True,
+        kill_on_timeout: bool = True,
+    ) -> tuple[str, int]:
         try:
-            self.shell.sendline(command)
             self.shell.expect(self.__bash_expect_regex, timeout=timeout)
 
             output = self.shell.before
 
             # Get exit code
             self.shell.sendline('echo $?')
-            logger.debug(f'Executing command for exit code: {command}')
+            logger.debug('Requesting exit code...')
             self.shell.expect(self.__bash_expect_regex, timeout=timeout)
             _exit_code_output = self.shell.before
-            logger.debug(f'Exit code Output: {_exit_code_output}')
             exit_code = int(_exit_code_output.strip().split()[0])
 
         except pexpect.TIMEOUT as e:
-            self.shell.sendintr()  # send SIGINT to the shell
-            self.shell.expect(self.__bash_expect_regex, timeout=timeout)
-            output = self.shell.before
-            output += (
-                '\r\n\r\n'
-                + f'[Command timed out after {timeout} seconds. SIGINT was sent to interrupt it.]'
-            )
-            exit_code = 130  # SIGINT
-            logger.error(f'Failed to execute command: {command}. Error: {e}')
+            if kill_on_timeout:
+                output, exit_code = self._interrupt_bash()
+                output += (
+                    '\r\n\r\n'
+                    + f'[Command timed out after {timeout} seconds. SIGINT was sent to interrupt it.]'
+                )
+                logger.error(f'Failed to execute command. Error: {e}')
+            else:
+                output = self.shell.before or ''
+                exit_code = -1
 
         finally:
             bash_prompt = self._get_bash_prompt_and_update_pwd()
             if keep_prompt:
                 output += '\r\n' + bash_prompt
             logger.debug(f'Command output: {output}')
-
         return output, exit_code
 
     async def run_action(self, action) -> Observation:
@@ -293,11 +314,23 @@ async def run(self, action: CmdRunAction) -> CmdOutputObservation:
             commands = split_bash_commands(action.command)
             all_output = ''
             for command in commands:
-                output, exit_code = self._execute_bash(
-                    command,
-                    timeout=action.timeout,
-                    keep_prompt=action.keep_prompt,
-                )
+                if command == '':
+                    output, exit_code = self._continue_bash(
+                        timeout=SOFT_TIMEOUT_SECONDS,
+                        keep_prompt=action.keep_prompt,
+                        kill_on_timeout=False,
+                    )
+                elif command.lower() == 'ctrl+c':
+                    output, exit_code = self._interrupt_bash(
+                        timeout=SOFT_TIMEOUT_SECONDS
+                    )
+                else:
+                    output, exit_code = self._execute_bash(
+                        command,
+                        timeout=SOFT_TIMEOUT_SECONDS,
+                        keep_prompt=action.keep_prompt,
+                        kill_on_timeout=False,
+                    )
                 if all_output:
                     # previous output already exists with prompt "user@hostname:working_dir #""
                     # we need to add the command to the previous output,
@@ -690,5 +723,4 @@ async def list_files(request: Request):
             return []
 
     logger.info(f'Starting action execution API on port {args.port}')
-    print(f'Starting action execution API on port {args.port}')
     run(app, host='0.0.0.0', port=args.port)
diff --git a/openhands/runtime/utils/bash.py b/openhands/runtime/utils/bash.py
@@ -4,6 +4,8 @@
 
 
 def split_bash_commands(commands):
+    if not commands.strip():
+        return ['']
     try:
         parsed = bashlex.parse(commands)
     except bashlex.errors.ParsingError as e:

diff --git a/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_001.log b/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_001.log
@@ -8,8 +8,13 @@ The assistant can use a Python environment with <execute_ipython>, e.g.:
 print("Hello World!")
 </execute_ipython>
 The assistant can execute bash commands wrapped with <execute_bash>, e.g. <execute_bash> ls </execute_bash>.
-The assistant is not allowed to run interactive commands. For commands that may run indefinitely,
-the output should be redirected to a file and the command run in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
+If a bash command returns exit code `-1`, this means the process is not yet finished.
+The assistant must then send a second <execute_bash>. The second <execute_bash> can be empty
+(which will retrieve any additional logs), or it can contain text to be sent to STDIN of the running process,
+or it can contain the text `ctrl+c` to interrupt the process.
+
+For commands that may run indefinitely, the output should be redirected to a file and the command run
+in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
 If a command execution result says "Command timed out. Sending SIGINT to the process",
 the assistant should retry running the command in the background.
 
@@ -175,7 +180,6 @@ IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_bro
 The assistant should utilize full file paths and the `pwd` command to prevent path-related errors.
 The assistant must avoid apologies and thanks in its responses.
 
-
 ----------
 
 Here is an example of how you can interact with the environment for task solving:

diff --git a/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_002.log b/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_002.log
@@ -111,7 +111,6 @@ Don't execute multiple actions at once if you need feedback from the page.
 
 
 
-
 ----------
 
 # Current Accessibility Tree:

diff --git a/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_003.log b/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_003.log
@@ -111,7 +111,6 @@ Don't execute multiple actions at once if you need feedback from the page.
 
 
 
-
 ----------
 
 # Current Accessibility Tree:

diff --git a/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_004.log b/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_004.log
@@ -111,7 +111,6 @@ Don't execute multiple actions at once if you need feedback from the page.
 
 
 
-
 ----------
 
 # Current Accessibility Tree:

diff --git a/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_005.log b/tests/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/prompt_005.log
@@ -8,8 +8,13 @@ The assistant can use a Python environment with <execute_ipython>, e.g.:
 print("Hello World!")
 </execute_ipython>
 The assistant can execute bash commands wrapped with <execute_bash>, e.g. <execute_bash> ls </execute_bash>.
-The assistant is not allowed to run interactive commands. For commands that may run indefinitely,
-the output should be redirected to a file and the command run in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
+If a bash command returns exit code `-1`, this means the process is not yet finished.
+The assistant must then send a second <execute_bash>. The second <execute_bash> can be empty
+(which will retrieve any additional logs), or it can contain text to be sent to STDIN of the running process,
+or it can contain the text `ctrl+c` to interrupt the process.
+
+For commands that may run indefinitely, the output should be redirected to a file and the command run
+in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
 If a command execution result says "Command timed out. Sending SIGINT to the process",
 the assistant should retry running the command in the background.
 
@@ -175,7 +180,6 @@ IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_bro
 The assistant should utilize full file paths and the `pwd` command to prevent path-related errors.
 The assistant must avoid apologies and thanks in its responses.
 
-
 ----------
 
 Here is an example of how you can interact with the environment for task solving:
@@ -404,15 +408,13 @@ The server is running on port 5000 with PID 126. You can access the list of numb
 NOW, LET'S START!
 Browse localhost:8000, and tell me the ultimate answer to life. Do not ask me for confirmation at any point.
 
-
 ----------
 
 
 <execute_browse>
 Certainly! I'll browse localhost:8000 and retrieve the ultimate answer to life for you.. I should start with: Get the content on "http://localhost:8000"
 </execute_browse>
 
-
 ----------
 
 OBSERVATION:

diff --git a/...s/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/response_003.log b/...s/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/response_003.log
@@ -2,5 +2,5 @@ To accomplish our goal of retrieving the ultimate answer to life, we need to cli
 
 Here's the action I'll take:
 
-In order to accomplish my goal, I need to click on the button with bid 10
+In order to accomplish our goal, I need to click on the button with bid 10
 ```click("10"
diff --git a/...s/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/response_004.log b/...s/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/response_004.log
@@ -1,6 +1,6 @@
-Based on the current Accessibility Tree and the previous actions taken, it appears that we have already successfully navigated to the webpage and clicked the button to reveal the answer. The answer is now visible on the page.
+Based on the current Accessibility Tree and the previous actions taken, it appears that we have already navigated to the correct page and clicked the button to reveal the answer. The answer is now visible on the page.
 
-To accomplish our goal of retrieving the ultimate answer to life, I will now send this information to the user:
+To accomplish our goal of retrieving the ultimate answer to life, I will now send a message to the user with the revealed answer:
 
 ```
 send_msg_to_user("The ultimate answer to life, the universe, and everything is: OpenHands is all you need!"
diff --git a/...s/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/response_005.log b/...s/integration/mock/eventstream_runtime/CodeActAgent/test_browse_internet/response_005.log
@@ -2,6 +2,6 @@ Based on the content retrieved from localhost:8000, I can provide you with the u
 
 The ultimate answer to life, the universe, and everything is: OpenHands is all you need!
 
-This appears to be the answer provided by the server at localhost:8000. It's a playful reference to Douglas Adams' "The Hitchhiker's Guide to the Galaxy," where the answer to life, the universe, and everything is famously 42. In this case, it's been customized to highlight "OpenHands."
+This appears to be the answer provided by the server at localhost:8000. It's a playful reference to Douglas Adams' "The Hitchhiker's Guide to the Galaxy," where the answer to life, the universe, and everything is famously 42. In this case, it's been customized to highlight OpenHands.
 
 <finish></finish>
diff --git a/tests/integration/mock/eventstream_runtime/CodeActAgent/test_edits/prompt_001.log b/tests/integration/mock/eventstream_runtime/CodeActAgent/test_edits/prompt_001.log
@@ -8,8 +8,13 @@ The assistant can use a Python environment with <execute_ipython>, e.g.:
 print("Hello World!")
 </execute_ipython>
 The assistant can execute bash commands wrapped with <execute_bash>, e.g. <execute_bash> ls </execute_bash>.
-The assistant is not allowed to run interactive commands. For commands that may run indefinitely,
-the output should be redirected to a file and the command run in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
+If a bash command returns exit code `-1`, this means the process is not yet finished.
+The assistant must then send a second <execute_bash>. The second <execute_bash> can be empty
+(which will retrieve any additional logs), or it can contain text to be sent to STDIN of the running process,
+or it can contain the text `ctrl+c` to interrupt the process.
+
+For commands that may run indefinitely, the output should be redirected to a file and the command run
+in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
 If a command execution result says "Command timed out. Sending SIGINT to the process",
 the assistant should retry running the command in the background.
 
@@ -175,7 +180,6 @@ IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_bro
 The assistant should utilize full file paths and the `pwd` command to prevent path-related errors.
 The assistant must avoid apologies and thanks in its responses.
 
-
 ----------
 
 Here is an example of how you can interact with the environment for task solving:

diff --git a/tests/integration/mock/eventstream_runtime/CodeActAgent/test_edits/prompt_002.log b/tests/integration/mock/eventstream_runtime/CodeActAgent/test_edits/prompt_002.log
@@ -8,8 +8,13 @@ The assistant can use a Python environment with <execute_ipython>, e.g.:
 print("Hello World!")
 </execute_ipython>
 The assistant can execute bash commands wrapped with <execute_bash>, e.g. <execute_bash> ls </execute_bash>.
-The assistant is not allowed to run interactive commands. For commands that may run indefinitely,
-the output should be redirected to a file and the command run in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
+If a bash command returns exit code `-1`, this means the process is not yet finished.
+The assistant must then send a second <execute_bash>. The second <execute_bash> can be empty
+(which will retrieve any additional logs), or it can contain text to be sent to STDIN of the running process,
+or it can contain the text `ctrl+c` to interrupt the process.
+
+For commands that may run indefinitely, the output should be redirected to a file and the command run
+in the background, e.g. <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
 If a command execution result says "Command timed out. Sending SIGINT to the process",
 the assistant should retry running the command in the background.
 
@@ -175,7 +180,6 @@ IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_bro
 The assistant should utilize full file paths and the `pwd` command to prevent path-related errors.
 The assistant must avoid apologies and thanks in its responses.
 
-
 ----------
 
 Here is an example of how you can interact with the environment for task solving:
@@ -404,15 +408,13 @@ The server is running on port 5000 with PID 126. You can access the list of numb
 NOW, LET'S START!
 Fix typos in bad.txt. Do not ask me for confirmation at any point.
 
-
 ----------
 
 Certainly! I'll fix the typos in the bad.txt file. Let's start by opening the file and examining its contents.
 <execute_ipython>
 open_file('bad.txt')
 </execute_ipython>
 
-
 ----------
 
 OBSERVATION:
Original file line number	Diff line number	Diff line change
Expand Up		@@ -111,7 +111,6 @@ Don't execute multiple actions at once if you need feedback from the page.




		----------

		# Current Accessibility Tree:
Expand Down