Skip to content

Commit 240c9e3

Browse files
committed
Initial commit
0 parents  commit 240c9e3

25 files changed

+1478
-0
lines changed

.env.example

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# rename this to .env and fill in your own keys
2+
3+
OPENAI_API_KEY="sk-proj-123"
4+
OPENAI_ORG = "org-123"
5+
6+
BROWSERBASE_API_KEY="00000000-0000-0000-0000-000000000000"
7+
BROWSERBASE_PROJECT_ID="bb_live_00000000-00000"
8+
9+
SCRAPYBARA_API_KEY="scrapy-123"

.github/workflows/build-and-push.yml

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Build and Push
2+
on:
3+
push:
4+
branches: ["main"]
5+
6+
jobs:
7+
build-and-push:
8+
runs-on: ubuntu-latest
9+
permissions:
10+
contents: read
11+
packages: write
12+
steps:
13+
- name: Check out
14+
uses: actions/checkout@v3
15+
16+
- name: Log in to GHCR
17+
uses: docker/login-action@v2
18+
with:
19+
registry: ghcr.io
20+
username: ${{ github.actor }}
21+
password: ${{ secrets.GITHUB_TOKEN }}
22+
23+
- name: Build
24+
run: docker build -t ghcr.io/openai/openai-cua-sample-app:latest .
25+
26+
- name: Push
27+
run: docker push ghcr.io/openai/openai-cua-sample-app:latest

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
__pycache__/
2+
.env
3+
.venv/

Dockerfile

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
FROM ubuntu:22.04
2+
ENV DEBIAN_FRONTEND=noninteractive
3+
4+
# 1) Install Xfce, x11vnc, Xvfb, xdotool, etc., but remove any screen lockers or power managers
5+
RUN apt-get update && apt-get install -y \
6+
xfce4 \
7+
xfce4-goodies \
8+
x11vnc \
9+
xvfb \
10+
xdotool \
11+
imagemagick \
12+
x11-apps \
13+
sudo \
14+
software-properties-common \
15+
imagemagick \
16+
&& apt-get remove -y light-locker xfce4-screensaver xfce4-power-manager || true \
17+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
18+
19+
# 2) Add the mozillateam PPA and install Firefox ESR
20+
RUN add-apt-repository ppa:mozillateam/ppa \
21+
&& apt-get update \
22+
&& apt-get install -y --no-install-recommends firefox-esr \
23+
&& update-alternatives --set x-www-browser /usr/bin/firefox-esr \
24+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
25+
26+
# 3) Create non-root user
27+
RUN useradd -ms /bin/bash myuser \
28+
&& echo "myuser ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
29+
USER myuser
30+
WORKDIR /home/myuser
31+
32+
# 4) Set x11vnc password ("secret")
33+
RUN x11vnc -storepasswd secret /home/myuser/.vncpass
34+
35+
# 5) Expose port 5900 and run Xvfb, x11vnc, Xfce (no login manager)
36+
EXPOSE 5900
37+
CMD ["/bin/sh", "-c", "\
38+
Xvfb :99 -screen 0 1280x800x24 >/dev/null 2>&1 & \
39+
x11vnc -display :99 -forever -rfbauth /home/myuser/.vncpass -listen 0.0.0.0 -rfbport 5900 >/dev/null 2>&1 & \
40+
export DISPLAY=:99 && \
41+
startxfce4 >/dev/null 2>&1 & \
42+
sleep 2 && echo 'Container running!' && \
43+
tail -f /dev/null \
44+
"]
45+

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 OpenAI
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+151
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# Computer Using Agent Sample App
2+
3+
Get started building a [Computer Using Agent (CUA)](https://platform.openai.com/docs/guides/tools-computer-use) with the OpenAI API.
4+
5+
> [!CAUTION]
6+
> Computer use is in beta. Because the model is still in preview and may be susceptible to exploits and inadvertent mistakes, we discourage trusting it in authenticated environments or for high-stakes tasks.
7+
8+
## Set Up & Run
9+
10+
Set up python env and install dependencies.
11+
12+
```shell
13+
python3 -m venv env
14+
source env/bin/activate
15+
pip install -r requirements.txt
16+
```
17+
18+
Run CLI to let CUA use a local browser window, using [playwright](https://playwright.dev/). (Stop with CTRL+C)
19+
20+
```shell
21+
python cli.py --computer local-playwright
22+
```
23+
24+
Other included sample [computer environments](#computer-environments):
25+
26+
- [Docker](https://docker.com/) (containerized desktop)
27+
- [Browserbase](https://www.browserbase.com/) (remote browser, requires account)
28+
- [Scrapybara](https://scrapybara.com) (remote browser or computer, requires account)
29+
- ...or implement your own `Computer`!
30+
31+
## Overview
32+
33+
The computer use tool and model are available via the [Responses API](https://platform.openai.com/docs/api-reference/responses). At a high level, CUA will look at a screenshot of the computer interface and recommend actions. Specifically, it sends `computer_call`(s) with `actions` like `click(x,y)` or `type(text)` that you have to execute on your environment, and then expects screenshots of the outcomes.
34+
35+
You can learn more about this tool in the [Computer use guide](https://platform.openai.com/docs/guides/tools-computer-use).
36+
37+
## Abstractions
38+
39+
This repository defines two lightweight abstractions to make interacting with CUA agents more ergonomic. Everything works without them, but they provide a convenient separation of concerns.
40+
41+
| Abstraction | File | Description |
42+
| ----------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
43+
| `Computer` | `computers/computer.py` | Defines a `Computer` interface for various environments (local desktop, remote browser, etc.). An implementation of `Computer` is responsible for executing any `computer_action` sent by CUA (clicks, etc). |
44+
| `Agent` | `agent/agent.py` | Simple, familiar agent loop – implements `run_full_turn()`, which just keeps calling the model until all computer actions and function calls are handled. |
45+
46+
## CLI Usage
47+
48+
The CLI (`cli.py`) is the easiest way to get started with CUA. It accepts the following arguments:
49+
50+
- `--computer`: The computer environment to use. See the [Computer Environments](#computer-environments) section below for options. By default, the CLI will use the `local-playwright` environment.
51+
- `--input`: The initial input to the agent (optional: the CLI will prompt you for input if not provided)
52+
- `--debug`: Enable debug mode.
53+
- `--show`: Show images (screenshots) during the execution.
54+
- `--start-url`: Start the browsing session with a specific URL (only for browser environments). By default, the CLI will start the browsing session with `https://bing.com`.
55+
56+
### Run examples (optional)
57+
58+
The `examples` folder contains more examples of how to use CUA.
59+
60+
```shell
61+
python -m examples.weather_example
62+
```
63+
64+
For reference, the file `simple_cua_loop.py` implements the basics of the CUA loop.
65+
66+
You can run it with:
67+
68+
```shell
69+
python simple_cua_loop.py
70+
```
71+
72+
## Computer Environments
73+
74+
CUA can work with any `Computer` environment that can handle the [CUA actions](https://platform.openai.com/docs/api-reference/responses/object#responses/object-output):
75+
76+
| Action | Example |
77+
| ---------------------------------- | ------------------------------- |
78+
| `click(x, y, button="left")` | `click(24, 150)` |
79+
| `double_click(x, y)` | `double_click(24, 150)` |
80+
| `scroll(x, y, scroll_x, scroll_y)` | `scroll(24, 150, 0, -100)` |
81+
| `type(text)` | `type("Hello, World!")` |
82+
| `wait(ms=1000)` | `wait(2000)` |
83+
| `move(x, y)` | `move(24, 150)` |
84+
| `keypress(keys)` | `keypress(["CTRL", "C"])` |
85+
| `drag(path)` | `drag([[24, 150], [100, 200]])` |
86+
87+
This sample app provides a set of implemented `Computer` examples, but feel free to add your own!
88+
89+
| Computer | Option | Type | Description | Requirements |
90+
| ------------------- | ------------------ | --------- | --------------------------------- | ---------------------------------------------------------------- |
91+
| `LocalPlaywright` | local-playwright | `browser` | Local browser window | [Playwright SDK](https://playwright.dev/) |
92+
| `Docker` | docker | `linux` | Docker container environment | [Docker](https://docs.docker.com/engine/install/) running |
93+
| `Browserbase` | browserbase | `browser` | Remote browser environment | [Browserbase](https://www.browserbase.com/) API key in `.env` |
94+
| `ScrapybaraBrowser` | scrapybara-browser | `browser` | Remote browser environment | [Scrapybara](https://scrapybara.com/dashboard) API key in `.env` |
95+
| `ScrapybaraUbuntu` | scrapybara-ubuntu | `linux` | Remote Ubuntu desktop environment | [Scrapybara](https://scrapybara.com/dashboard) API key in `.env` |
96+
97+
Using the CLI, you can run the sample app with different computer environments using the options listed above:
98+
99+
```shell
100+
python cli.py --show --computer <computer-option>
101+
```
102+
103+
For example, to run the sample app with the `Docker` computer environment, you can run:
104+
105+
```shell
106+
python cli.py --show --computer docker
107+
```
108+
109+
### Docker Setup
110+
111+
If you want to run the sample app with the `Docker` computer environment, you need to build and run a local Docker container.
112+
113+
Open a new shell to build and run the Docker image. The first time you do this, it may take a few minutes, but subsequent runs should be much faster. Once the logs stop, proceed to the next setup step. To stop the container, press CTRL+C on the terminal where you ran the command below.
114+
115+
```shell
116+
docker build -t cua-sample-app .
117+
docker run --rm -it --name cua-sample-app -p 5900:5900 --dns=1.1.1.3 -e DISPLAY=:99 cua-sample-app
118+
```
119+
120+
> [!NOTE]
121+
> We use `--dns=1.1.1.3` to restrict accessible websites to a smaller, safer set. We highly recommend you take similar safety precautions.
122+
123+
> [!WARNING]
124+
> If you get the below error, then you need to kill that container.
125+
>
126+
> ```
127+
> docker: Error response from daemon: Conflict. The container name "/cua-sample-app" is already in use by container "e72fcb962b548e06a9dcdf6a99bc4b49642df2265440da7544330eb420b51d87"
128+
> ```
129+
>
130+
> Kill that container and try again.
131+
>
132+
> ```shell
133+
> docker rm -f cua-sample-app
134+
> ```
135+
136+
### Hosted environment setup
137+
138+
This repository contains example implementations of third-party hosted environments.
139+
To use these, you will need to set up an account with the service by following the links aboveand add your API key to the `.env` file.
140+
141+
## Function Calling
142+
143+
The `Agent` class accepts regular function schemas in `tools` – it will return a hard-coded value for any invocations.
144+
145+
However, if you pass in any `tools` that are also defined in your `Computer` methods, in addition to the required `Computer` methods, they will be routed to your `Computer` to be handled when called. **This is useful for cases where screenshots often don't capture the search bar or back arrow, so CUA may get stuck. So instead, you can provide a `back()` or `goto(url)` functions.** See `examples/playwright_with_custom_functions.py` for an example.
146+
147+
## Risks & Safety considerations
148+
149+
This repository provides example implementations with basic safety measures in place.
150+
151+
We recommend reviewing the best practices outlined in our [guide](https://platform.openai.com/docs/guides/tools-computer-use#risks-and-safety), and making sure you understand the risks involved with using this tool.

__init__.py

Whitespace-only changes.

agent/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .agent import Agent

0 commit comments

Comments
 (0)