> ## Documentation Index
> Fetch the complete documentation index at: https://momentic.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Appium

> How Momentic's managed step cache, AI primitives, and YAML authoring compare against Appium for mobile testing.

Momentic is a managed testing platform for iOS and Android. Tests are YAML,
executed on managed remote emulators and simulators. A multi-modal step cache
stores locator metadata per step and auto-heals in place when the UI changes. AI
primitives cover action, assertion, visual diff, and typed extraction. AI
providers route with cross-provider failover. A dashboard captures run videos,
view hierarchies, heal events, and AI reasoning.

[Appium](https://appium.io) is an open-source mobile automation framework that
exposes the WebDriver / W3C protocol across iOS, Android, and other platforms
via swappable drivers (UiAutomator2, XCUITest, Espresso, Flutter, Mac, Windows).
Tests are written in TypeScript / Python / Java / Ruby / .NET against the Appium
client of choice. It's well-suited to teams that want OSS, multi-language
flexibility, physical-hardware support, and deep customization at the driver
level, plus the bandwidth to maintain a verbose, locator-heavy codebase.

## Speed and caching

|                     | Momentic                                                                                                                   | Appium                                                                                                            |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| What's cached       | Multi-modal locator data per step ([docs](/reliability/step-cache)).                                                       | Nothing. Every locator strategy re-queries the device.                                                            |
| Heal on miss        | Re-resolves and **updates the entry in place**. Heal event on the run.                                                     | Not supported. A miss is a test failure after the WebDriverWait timeout.                                          |
| Waiting             | Built-in: navigation, `load`, screenshots, DOM / view-hierarchy mutations, same-origin requests. 3s default, configurable. | `implicitlyWait` (per-driver) or explicit `WebDriverWait` + `ExpectedConditions` per query. Boilerplate per step. |
| Storage             | Managed, git-aware.                                                                                                        | N/A.                                                                                                              |
| Cost of a UI change | Auto-heal absorbs renamed IDs, localized strings, reordered hierarchies.                                                   | One client-code edit per broken selector. XPath edits often require full re-derivation.                           |

## How the multi-modal cache works

A cached step stores more than one way to find the target: where it sits on
screen, what it looks like, what text it contains, and the accessibility and
structural attributes around it. Which of those signals matters for a given step
is inferred from the natural-language description. "The red Cancel button below
the Order Summary header" leans on visual and positional signals; "the Sign in
button" leans on accessibility and text. When a step replays, the runner checks
the stored signals against the live UI and runs the action without invoking the
LLM when there's a match.

## What happens on a UI change

A practical sequence that shows the difference. Take a sign-in screen whose
Email field has `accessibilityIdentifier = "email_input"`, after a passing
baseline run where the cache is warm.

**Refactor:** the app team renames `email_input` to `email_field`. The XPath
position of the field shifts because a container was added above it.

**Appium replay:**

1. `driver.$("~email_input")` issues a `findElement` request against the device.
   The WebDriver waits up to `implicitlyWait` (or the configured `WebDriverWait`
   timeout) for the element to appear.
2. The timeout elapses with no match. The client throws `NoSuchElementError`.
3. The test stops; the CI job fails. Someone edits the client code to use the
   new accessibility ID (or rewrites the XPath against the new hierarchy), opens
   a PR, gets it merged, and re-runs CI. If the XPath path was deep, the edit
   can cascade across multiple steps.

**Momentic replay:**

1. The cached locator for the `Email` step misses on the live device.
2. The locator agent re-resolves the original natural-language description
   `Email`.
3. The new locator binds, the step runs, the test passes.
4. The cache entry is updated in place. A heal event is attached to the run for
   review. Subsequent runs hit the cache normally.

Across a test suite this is the difference between a renamed-ID incident and a
no-op.

<Accordion title="Technical details">
  **Smart waiting**

  Momentic's default smart wait is 3000ms and configurable per test. The runner
  waits on a combination of navigation, `load`, screenshots, DOM / view-hierarchy
  mutations, and same-origin requests until the UI is quiet or the timeout
  elapses.

  **Appium waiting, for contrast**

  * `implicitlyWait` is per-driver. Too low -> flakes; too high -> padded runs.
  * Explicit waits (`WebDriverWait` +
    `ExpectedConditions.visibilityOfElementLocated`, `presenceOf`,
    `elementToBeClickable`) are per query. Common to layer 3-5 explicit waits per
    logical step.
  * No notion of network quiescence; teams instrument their own request
    interceptors or poll the UI.
</Accordion>

## Locators and AI primitives

|               | Momentic                                                                                                                    | Appium                                                                                                                                                        |
| ------------- | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Locator model | Natural-language descriptions resolved by an AI agent against a11y tree + view hierarchy + screenshot. Cached, auto-healed. | `accessibility id`, `id`, `xpath`, `class name`, `-android uiautomator` (UiSelector), `-ios predicate string`, `-ios class chain`, `-image` (template match). |
| Visual cues   | Color, icon, relative size, position part of the locator.                                                                   | `-image` strategy is template matching only (no semantic context).                                                                                            |
| Agentic step  | `act` accepts a multi-step goal; the agent plans and executes.                                                              | Not supported.                                                                                                                                                |
| AI assert     | `assert` is a first-class step type, fails by default.                                                                      | Not built-in. Teams build with `getText` + manual checks.                                                                                                     |
| Visual diff   | `assertVisually`, agent-scored against a golden.                                                                            | Not built-in. Third-party plugins (e.g. Applitools) bolt on.                                                                                                  |
| AI provider   | Managed; cross-provider failover handled by the platform.                                                                   | None. Teams integrate LLMs themselves.                                                                                                                        |

<Accordion title="Technical details">
  **Momentic mobile step types**

  * Action: `act`, `tap`, `doubleTap`, `longPress`, `type`, `swipe`, `scroll`,
    `back`, `dismissKeyboard`, `launchApp`, `terminateApp`
  * Assert: `assert`, `assertVisually`, `checkElement<...>`
  * Extract: `extract` (typed via JSON schema)
  * Control flow: `if/then`, modules, parameter inputs

  **Appium locator trade-offs, for contrast**

  * `accessibility id` is the most stable strategy but only exists when developers
    explicitly set `contentDescription` (Android) / `accessibilityIdentifier`
    (iOS). Production apps frequently miss them on dynamic content.
  * `id` (resource-id on Android) breaks under refactors and A/B testing.
  * `xpath` is the catch-all but is slow on large hierarchies and breaks on any
    structural change.
  * `-image` does template matching; works for static images, fails on themed UIs.
  * No locator strategy carries semantic intent. A failing step has no description
    to recover from.
</Accordion>

## Recovery, quarantine, and CI

|                  | Momentic                                                                                                                                   | Appium                                                                                                                    |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------- |
| Failure recovery | Post-run heal agent (`momentic ai heal`) rewrites failing tests and opens a PR or patch in CI; `momentic ai classify` triages the failure. | Not supported.                                                                                                            |
| Quarantine       | First-class: tests run, results report, exit code unaffected unless `--only-quarantined`.                                                  | Not supported.                                                                                                            |
| Sharding         | `--shard-index <i>` / `--shard-count <n>`, 1-indexed. Deterministic alphabetical partition.                                                | Owned by the host runner (pytest, Mocha, JUnit).                                                                          |
| Reporters        | `junit`, `allure`, `playwright-json`, `buildkite-json`.                                                                                    | Whatever the host runner emits (Allure, ExtentReports).                                                                   |
| Device fleet     | Remote Android 14/15 emulators and iOS 26 simulators with sub-1s provisioning, multi-region. Local AVDs / simulators supported.            | Bring-your-own: local devices, Appium server cluster, or device cloud (BrowserStack, Sauce, LambdaTest, AWS Device Farm). |
| Dashboard        | Run videos, traces, heal events, AI reasoning, screenshots, network.                                                                       | Third-party (Allure, vendor dashboards).                                                                                  |

<Accordion title="Technical details">
  **Sharding**: `--shard-index <i>` / `--shard-count <n>`. Deterministic,
  contiguous partition of the test suite.

  **Device provisioning**: each test gets its own device session, so parallel runs
  don't share device state. No per-test execution cap.

  **Appium grid, for contrast**: Appium servers run locally or in a Selenium Grid.
  Device farms layer their own provisioning on top. Common issues: stale UI
  hierarchies between sessions, dangling driver processes, capability drift across
  drivers and OS versions.
</Accordion>

## Authoring side-by-side

```ts theme={null}
// Appium with WebdriverIO
import { remote } from "webdriverio";

const driver = await remote({
  hostname: "localhost",
  port: 4723,
  capabilities: {
    platformName: "Android",
    "appium:deviceName": "emulator-5554",
    "appium:automationName": "UiAutomator2",
    "appium:app": "/path/to/app.apk",
  },
});

const email = await driver.$("~email_input"); // accessibility id
await email.waitForDisplayed({ timeout: 10_000 });
await email.setValue("ada@example.com");
const password = await driver.$("~password_input");
await password.setValue("secret");
const signIn = await driver.$('//*[@text="Sign in"]'); // XPath
await signIn.click();
await driver.waitUntil(
  async () => (await driver.$('//*[contains(@text, "Welcome")]')).isDisplayed(),
  { timeout: 10_000 },
);
// "Chart visible and not cut off" requires custom logic.
await driver.deleteSession();
```

Agentic simplified format:

```yaml theme={null}
fileType: momentic/test/v2
id: sign-in-and-verify
steps:
  - act: Sign in with ada@example.com / secret
  - assert: The dashboard chart is visible and not cut off
```

Explicit simplified format (same flow, step-by-step):

```yaml theme={null}
fileType: momentic/test/v2
id: sign-in-and-verify
steps:
  - type:
      text: ada@example.com
      into: Email
  - type:
      text: secret
      into: Password
  - tap: Sign in
  - assert: The dashboard chart is visible and not cut off
```

## A more realistic test

The hello-world above doesn't show the full simplified format surface. A
representative onboarding regression with module reuse, parameter inputs, typed
extraction, and a conditional looks like this:

```yaml onboarding.test.yaml theme={null}
fileType: momentic/test/v2
id: onboarding-with-promo
steps:
  - launchApp
  - module:
      path: ../modules/sign-in.module.yaml
      inputs:
        EMAIL: env.QA_EMAIL
        PASSWORD: env.QA_PASSWORD
  - act: Skip the onboarding tour and land on Home
  - tap: Account
  - type:
      text: "{{ env.PROMO_CODE }}"
      into: Promo code field
  - tap: Apply
  - if:
      assert: A success banner saying the promo was applied is visible
      then:
        - extract:
            goal: The discounted monthly total shown on the plan card
            schema:
              type: object
              properties:
                amount:
                  type: number
              required: [amount]
  - if:
      assert: An invalid-promo error is visible
      then:
        - assert: The plan price is unchanged
  - assertVisually: The plan card is fully visible and not cut off
```

The matching module:

```yaml ../modules/sign-in.module.yaml theme={null}
fileType: momentic/module/v2
id: sign-in
name: Sign in
parameters:
  - name: EMAIL
  - name: PASSWORD
steps:
  - type:
      text: "{{ env.EMAIL }}"
      into: Email
  - type:
      text: "{{ env.PASSWORD }}"
      into: Password
  - tap: Sign in
  - assert: The Home tab is visible
```

There is no equivalent first-class surface in Appium. Reuse is by extracting
host-language helpers; extraction is whatever the client codes; conditionals are
`if` in the host language; visual assertions need a third-party plugin.

## When to pick which

**Appium is the right call if** you have an existing Appium test suite the team
wants to keep, you have a hard requirement for an OSS WebDriver-protocol layer,
you need multi-language clients, or you do deep customization at the driver
level (custom plugins, native command extensions).

**Momentic is the right call if** wall-clock run time matters at scale, selector
maintenance is a real recurring cost, you want AI assertions that fail the test
by default, you'd rather author in YAML than maintain a multi-language WebDriver
codebase, and you expect healing, recovery, quarantine, sub-second emulator
boots, and run videos built in.

For the build-it-yourself version of this decision, see
[Build vs. buy](/comparisons/build-vs-buy).