Back to blog

Connecting Claude to a Real Phone via MCP

7 min read
DemoClaude

The Setup: 90 Seconds

I plugged an Android phone into my MacBook. Opened Claude Desktop. Added one block to the MCP config:

{
  "mcpServers": {
    "drengr": {
      "command": "drengr",
      "args": ["mcp"]
    }
  }
}

That's the entire setup. No Appium. No Selenium grid. No environment variables pointing to Java homes and Android SDK paths. Just npm install -g drengr, plug in the phone, and tell Claude what to do.

"Open YouTube and Find a Video About MCP Servers"

I typed that into Claude. Here's what happened over the next 40 seconds:

  1. Claude called drengr_look — got a text description of the home screen
  2. It saw YouTube in the app list and called drengr_do to launch it
  3. YouTube opened. Claude called drengr_look again — got the YouTube home feed as a list of labeled elements
  4. It tapped the search bar, typed "MCP servers," and hit search
  5. Results appeared. Claude read the titles and tapped the most relevant video
  6. The video started playing. Claude confirmed: "Found and playing 'MCP Server Explained' by IBM Technology"

Six actions. No scripts. No selectors. Claude read the screen, made decisions, and executed actions — exactly like a human would, except it took 40 seconds instead of 2 minutes.

The Moment It Got Interesting

I then asked: "Now go to Shorts and swipe through a few."

Claude navigated to the Shorts tab, swiped up three times, read the titles of each short, and told me what it saw. It handled the vertical scroll, the full-screen video player, the overlay buttons — all without any special configuration.

This is the kind of interaction that breaks traditional test frameworks. Shorts uses a custom renderer, the UI tree is minimal, the scroll behavior is non-standard. A selector-based test would need a custom handler for every quirk. Claude just... used the app.

What's Actually Happening Under the Hood

Claude → calls MCP tools → Drengr → talks to the device → Phone

Drengr handles the messy parts:

  • Capturing the screen and parsing the UI tree into a format the AI can read
  • Translating "tap element 3" into the right platform command
  • Reporting back what changed after every action (the situation report)
  • Detecting if the app crashed or the UI got stuck

Claude handles the smart parts:

  • Looking at the screen description and deciding what to do
  • Adapting when something unexpected happens
  • Knowing when the task is complete

This separation is deliberate. The AI is the brain, Drengr is the hands. When better AI models come out, Drengr doesn't need to change — the hands stay the same, the brain gets smarter.

The Things That Surprised Me

It recovers from mistakes. At one point Claude tapped the wrong video. It noticed the title didn't match what it expected, pressed back, and picked the right one. No retry logic, no error handling code — the AI just adapted.

It works across apps. I asked Claude to "check my notifications" after the YouTube test. It pressed home, pulled down the notification shade, read the notifications, and summarized them. No app-specific setup needed.

Text mode is almost always enough. Out of ~30 actions across the session, Claude only needed the annotated screenshot twice — both times on screens with custom-rendered content. The rest worked with the ~300 token text description. That's dramatically cheaper than sending images every step.

What It Can't Do (Yet)

I'm not going to pretend this replaces manual testing today. Some limits are real:

  • Speed — Each step takes 2-3 seconds (LLM round-trip). A human tester can tap faster. But the human can't run 50 test flows in parallel on a device farm.
  • Visual verification — Claude can tell if an element exists, but not if it "looks right." Color, alignment, spacing — these need human eyes or a visual regression tool.
  • Complex gestures — Standard taps, swipes, long presses, and pinch zooms work. But game-specific multi-touch patterns aren't there yet.

The sweet spot today is regression testing: "does the checkout flow still work after this deploy?" That's the 80% of QA time that's spent running the same flows every sprint. Let the AI handle that, and let humans focus on exploratory testing.

Try It Yourself

npm install -g drengr
drengr doctor          # check your setup
drengr setup --client claude-desktop  # generate MCP config

Connect a device, open your AI client, and tell it what to test. The first time an AI agent navigates your app without a single line of test code, you'll understand why I built this.