agent-device

Introduction: CLI to control iOS and Android devices for AI agents
More: Author   ReportBugs   
Tags:

CLI to control iOS and Android devices for AI agents influenced by Vercel’s agent-browser.

The project is in early development, considered experimental. Pull requests are welcome!

Current scope (v1)

  • Platforms: iOS (simulator + limited device support) and Android (emulator + device).
  • Core commands: open, back, home, app-switcher, press, long-press, focus, type, fill, scroll, scrollintoview, wait, alert, screenshot, close.
  • Inspection commands: snapshot (accessibility tree).
  • Device tooling: adb (Android), simctl/devicectl (iOS via Xcode).
  • Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).

Install

npm install -g agent-device

Or use it without installing:

npx agent-device open SampleApp

Usage

agent-device <command> [args] [--json]

Examples:

agent-device open SampleApp
agent-device snapshot
agent-device snapshot -s @e7
agent-device click @e7
agent-device wait text "Camera"
agent-device alert wait 10000
agent-device back
agent-device type "hello"
agent-device screenshot --out ./screenshot.png
agent-device close SampleApp

Best practice: run snapshot immediately before interactions to avoid stale coordinates if the Simulator window moves or UI changes. When interacting with UI elements from a snapshot, prefer refs (e.g. click @e7) over raw coordinates. Refs are stable across runs and avoid coordinate drift.

Coordinates:

  • All coordinate-based commands (press, long-press, focus, fill) use device coordinates with origin at top-left.
  • X increases to the right, Y increases downward.

iOS snapshots:

  • Default backend is hybrid because it provides the best speed vs correctness trade-off: AX is fast but can miss UI details, while XCTest is slower but more complete. Hybrid uses the fast AX snapshot first, then fills empty containers (tab bars/toolbars/groups) with scoped XCTest snapshots.
  • ax is the fast AX-only backend and requires enabling Accessibility for the terminal app in System Settings.
  • xctest is the slower XCTest-only backend that avoids Accessibility permissions.
  • You can scope snapshots to a label or identifier with -s "<label>" or to a previous ref with -s @ref. In practice, if AX returns a Tab Bar group with no children, hybrid will run a scoped XCTest snapshot for Tab Bar and insert those nodes under the group.

Flags:

  • --platform ios|android
  • --device <name>
  • --udid <udid> (iOS)
  • --serial <serial> (Android)
  • --out <path> (screenshot)
  • --session <name>
  • --verbose for daemon and runner logs
  • --json for structured output
  • --backend ax|xctest|hybrid (snapshot only; defaults to hybrid on iOS)

Tracing:

  • trace start [path] to begin capturing AX/XCTest logs for the session.
  • trace stop [path] to stop capture and optionally move the trace log.

Sessions:

  • open starts a session. Without args boots/activates the target device/simulator without launching an app.
  • All interaction commands require an open session.
  • close stops the session and releases device resources. Pass an app to close it explicitly, or omit to just close the session.
  • Use --session <name> to manage multiple sessions.
  • Session logs are written to ~/.agent-device/sessions/<session>-<timestamp>.ad.

Snapshot defaults to the hybrid backend on iOS simulators. Use --backend ax for AX-only or --backend xctest for XCTest-only.

Find (semantic):

  • find <text> <action> [value] finds by any text (label/value/identifier) using a scoped snapshot.
  • find text|label|value|role|id <value> <action> [value] for specific locators.
  • Actions: click (default), fill, type, focus, get text, get attrs, wait [timeout], exists.

Settings helpers (simulators):

  • settings wifi on|off
  • settings airplane on|off
  • settings location on|off (iOS uses per‑app permission for the current session app)
    • Note: iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.

App state:

  • appstate shows the foreground app/activity (Android). On iOS it uses the current session app when available, otherwise it falls back to a snapshot-based guess (AX first, XCTest if AX can’t identify).
  • apps --metadata returns app list with minimal metadata.

Debug

  • Start trace capture before a flaky sequence:
    • agent-device trace start
    • agent-device trace stop ./trace.log
  • The trace log includes AX snapshot stderr and XCTest runner logs for the session.
  • Built-in retries cover transient runner connection failures, AX snapshot hiccups, and Android UI dumps.
  • For snapshot issues, compare --backend ax vs --backend xctest and scope with -s "<label>".

App resolution

  • Bundle/package identifiers are accepted directly (e.g., com.apple.Preferences).
  • Human-readable names are resolved when possible (e.g., Settings).
  • Built-in aliases include Settings for both platforms.

iOS notes

  • Input commands (press, type, scroll, etc.) are supported only on simulators in v1 and use the XCTest runner.
  • alert and scrollintoview use the XCTest runner and are simulator-only in v1.
  • Real device support (including snapshots) is on the roadmap for iOS.

Testing

pnpm test

Build

pnpm build

Environment selectors:

  • ANDROID_DEVICE=Pixel_9_Pro_XL or ANDROID_SERIAL=emulator-5554
  • IOS_DEVICE="iPhone 17 Pro" or IOS_UDID=<udid>

Test screenshots are written to:

  • test/screenshots/android-settings.png
  • test/screenshots/ios-settings.png

Contributing

See CONTRIBUTING.md.

Made at Callstack

agent-device is an open source project and will always remain free to use. Callstack is a group of React and React Native geeks. Contact us at hello@callstack.com if you need any help with these technologies or just want to say hi.

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools