Skip to content

AppiumTestDistribution/AppClaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AppClaw logo

AppClaw

AI-powered mobile automation agent for Android and iOS. Tell it what to do in plain English — it figures out what to tap, type, and swipe.

AppClaw demo
You: "Send a WhatsApp message to Mom
      saying good morning"

AppClaw:
  Step 1: Open WhatsApp
  Step 2: Search for Mom
  Step 3: Open chat with Mom
  Step 4: Type "good morning"
  Step 5: Tap Send
  Step 6: Done

  ✅ Goal completed in 6 steps.

Prerequisites

  1. Node.js 18+
  2. Device connected — USB, emulator, or simulator
  3. Gemini API key from Google AI Studio

Installation

From npm

npm install -g appclaw

Create a .env file in your working directory:

cp .env.example .env

Local development

git clone https://github.com/AppiumTestDistribution/appclaw.git
cd appclaw
npm install
cp .env.example .env

Edit .env based on your preferred mode:

Vision + Stark (recommended)

Screenshot-first mode using Stark (df-vision + Gemini) for element location. Requires a Gemini API key.

LLM_PROVIDER=gemini
LLM_API_KEY=your-gemini-api-key
LLM_MODEL=gemini-3.1-flash-lite-preview
AGENT_MODE=vision
VISION_LOCATE_PROVIDER=stark
Vision + Appium MCP

Screenshot-first mode using appium-mcp's server-side AI vision for element location. See appium-mcp AI Vision setup for details.

LLM_PROVIDER=gemini
LLM_API_KEY=your-gemini-api-key
LLM_MODEL=gemini-3.1-flash-lite-preview
AGENT_MODE=vision
VISION_LOCATE_PROVIDER=appium_mcp
AI_VISION_ENABLED=true
AI_VISION_API_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
AI_VISION_API_KEY=your-vision-api-key
AI_VISION_MODEL=gemini-2.0-flash
DOM mode

Uses XML page source to find elements by accessibility ID, xpath, etc. No vision needed — works with any LLM provider.

LLM_PROVIDER=gemini            # or anthropic, openai, groq, ollama
LLM_API_KEY=your-api-key
AGENT_MODE=dom

Usage

# Interactive mode
appclaw

# Pass goal directly
appclaw "Open Settings"
appclaw "Search for cats on YouTube"
appclaw "Turn on WiFi"
appclaw "Send hello on WhatsApp to Mom"

# Or with npx (no global install)
npx appclaw "Open Settings"

When running from a local clone, use npm start instead:

npm start
npm start "Open Settings"

YAML flows (no LLM needed)

Run declarative steps from a YAML file:

appclaw --flow examples/flows/google-search.yaml

Configuration

All configuration is via .env:

Variable Default Description
LLM_PROVIDER gemini LLM provider (currently only gemini is supported for vision)
LLM_API_KEY Gemini API key
LLM_MODEL (auto) Model override (e.g. gemini-2.0-flash)
AGENT_MODE vision dom (XML locators) or vision (screenshot-first)
VISION_LOCATE_PROVIDER stark Vision backend for locating elements
MAX_STEPS 30 Max steps per goal
STEP_DELAY 500 Milliseconds between steps
SHOW_TOKEN_USAGE false Print token usage and cost per step

How It Works

Each step, AppClaw:

  1. Perceives — reads the device screen (UI elements or screenshot)
  2. Reasons — sends the goal + screen state to an LLM, which decides the next action
  3. Acts — executes the action (tap, type, swipe, launch app, etc.)
  4. Repeats until the goal is complete or max steps reached

Agent Actions

Action Description
tap Tap an element
type Type text into an input
scroll / swipe Scroll or swipe gesture
launch Open an app
back / home Navigation buttons
long_press / double_tap Touch gestures
find_and_tap Scroll to find, then tap
ask_user Pause for user input (OTP, CAPTCHA)
done Goal complete

Failure Recovery

Mechanism What it does
Stuck detection Detects repeated screens/actions, injects recovery hints
Checkpointing Saves known-good states for rollback
Human-in-the-loop Pauses for OTP, CAPTCHA, or ambiguous choices
Action retry Feeds failures back to the LLM for re-planning

License

Licensed under the Apache License, Version 2.0. See LICENSE for the full text.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors