Skip to content

Use OmniParser to create a UI Surfer Agent #5713

@ekzhu

Description

@ekzhu

Confirmation

  • I confirm that I am a maintainer and so can use this template. If I am not, I understand this issue will be closed and I will be asked to use a different template.

Issue body

  1. UI tools for controlling with keyboards and mouse, as well as capturing screenshots. Considering a MCP server for possible implementation.
  2. Use OmniParser to convert screenshots to structured messages to store in model context.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions