AppClaw

AI-powered mobile automation agent for Android and iOS. Tell it what to do in plain English — it figures out what to tap, type, and swipe.

You: "Send a WhatsApp message to Mom
      saying good morning"

AppClaw:
  Step 1: Open WhatsApp
  Step 2: Search for Mom
  Step 3: Open chat with Mom
  Step 4: Type "good morning"
  Step 5: Tap Send
  Step 6: Done

  ✅ Goal completed in 6 steps.

Prerequisites

Node.js 18+
Device connected — USB, emulator, or simulator
Gemini API key from Google AI Studio

Installation

From npm

npm install -g appclaw

Create a .env file in your working directory:

cp .env.example .env

Local development

git clone https://github.com/AppiumTestDistribution/appclaw.git
cd appclaw
npm install
cp .env.example .env

Edit .env based on your preferred mode:

Vision + Stark (recommended)

Screenshot-first mode using Stark (df-vision + Gemini) for element location. Requires a Gemini API key.

LLM_PROVIDER=gemini
LLM_API_KEY=your-gemini-api-key
LLM_MODEL=gemini-3.1-flash-lite-preview
AGENT_MODE=vision
VISION_LOCATE_PROVIDER=stark

Vision + Appium MCP

Screenshot-first mode using appium-mcp's server-side AI vision for element location. See appium-mcp AI Vision setup for details.

LLM_PROVIDER=gemini
LLM_API_KEY=your-gemini-api-key
LLM_MODEL=gemini-3.1-flash-lite-preview
AGENT_MODE=vision
VISION_LOCATE_PROVIDER=appium_mcp
AI_VISION_ENABLED=true
AI_VISION_API_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
AI_VISION_API_KEY=your-vision-api-key
AI_VISION_MODEL=gemini-2.0-flash

DOM mode

Uses XML page source to find elements by accessibility ID, xpath, etc. No vision needed — works with any LLM provider.

LLM_PROVIDER=gemini            # or anthropic, openai, groq, ollama
LLM_API_KEY=your-api-key
AGENT_MODE=dom

Usage

# Interactive mode
appclaw

# Pass goal directly
appclaw "Open Settings"
appclaw "Search for cats on YouTube"
appclaw "Turn on WiFi"
appclaw "Send hello on WhatsApp to Mom"

# Or with npx (no global install)
npx appclaw "Open Settings"

When running from a local clone, use npm start instead:

npm start
npm start "Open Settings"

YAML flows (no LLM needed)

Run declarative steps from a YAML file:

appclaw --flow examples/flows/google-search.yaml

Configuration

All configuration is via .env:

Variable	Default	Description
`LLM_PROVIDER`	`gemini`	LLM provider (currently only `gemini` is supported for vision)
`LLM_API_KEY`	—	Gemini API key
`LLM_MODEL`	(auto)	Model override (e.g. `gemini-2.0-flash`)
`AGENT_MODE`	`vision`	`dom` (XML locators) or `vision` (screenshot-first)
`VISION_LOCATE_PROVIDER`	`stark`	Vision backend for locating elements
`MAX_STEPS`	`30`	Max steps per goal
`STEP_DELAY`	`500`	Milliseconds between steps
`SHOW_TOKEN_USAGE`	`false`	Print token usage and cost per step

How It Works

Each step, AppClaw:

Perceives — reads the device screen (UI elements or screenshot)
Reasons — sends the goal + screen state to an LLM, which decides the next action
Acts — executes the action (tap, type, swipe, launch app, etc.)
Repeats until the goal is complete or max steps reached

Agent Actions

Action	Description
`tap`	Tap an element
`type`	Type text into an input
`scroll` / `swipe`	Scroll or swipe gesture
`launch`	Open an app
`back` / `home`	Navigation buttons
`long_press` / `double_tap`	Touch gestures
`find_and_tap`	Scroll to find, then tap
`ask_user`	Pause for user input (OTP, CAPTCHA)
`done`	Goal complete

Failure Recovery

Mechanism	What it does
Stuck detection	Detects repeated screens/actions, injects recovery hints
Checkpointing	Saves known-good states for rollback
Human-in-the-loop	Pauses for OTP, CAPTCHA, or ambiguous choices
Action retry	Feeds failures back to the LLM for re-planning

License

Licensed under the Apache License, Version 2.0. See LICENSE for the full text.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
examples		examples
landing		landing
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AppClaw

Prerequisites

Installation

From npm

Local development

Usage

YAML flows (no LLM needed)

Configuration

How It Works

Agent Actions

Failure Recovery

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AppClaw

Prerequisites

Installation

From npm

Local development

Usage

YAML flows (no LLM needed)

Configuration

How It Works

Agent Actions

Failure Recovery

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages