> ## Documentation Index
> Fetch the complete documentation index at: https://momentic.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Playwright MCP

> How Momentic's preview-then-commit MCP loop and step cache compare against Microsoft's interact-then-generate Playwright MCP server.

Momentic is a managed testing platform for the web with its own MCP server. A
coding agent (Cursor, Claude Code, Codex, any MCP client) previews each
candidate step against the live page, gets a screenshot back, and only commits
the step on success. Saved tests are YAML, executed on a managed runner with a
multi-modal step cache and auto-heal so generated tests replay
deterministically.

[Playwright MCP](https://github.com/microsoft/playwright-mcp) is Microsoft's
open-source MCP server. It lets a coding agent (Cursor, Claude Code, VSCode
Copilot) drive a browser, capture snapshots, and synthesize Playwright tests.
It's well-suited to teams already on Playwright who want to assist authoring
with a coding agent and end up with standard Playwright code.

## Speed and caching

|                        | Momentic                                                                                                              | Playwright MCP                                                                                               |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| What's cached          | Multi-modal locator data per step ([docs](/reliability/step-cache)).                                                  | Nothing. PW MCP is an authoring tool; generated Playwright tests don't cache either.                         |
| Waiting                | Built-in: navigation, `load`, screenshots, DOM mutations, same-origin requests. 3s default, configurable.             | Generated tests use hard-coded `waitForTimeout`; manual `waitForResponse` per request.                       |
| Agent context per step | Screenshot + short status. Locator resolution / assertion evaluation happens server-side, outside the agent's prompt. | Full browser snapshot per `browser_*` tool call (DOM + a11y tree + console). Accumulates across the session. |
| Runtime hit cost       | Milliseconds, no LLM call.                                                                                            | N/A. Speed is whatever Playwright config + hard-coded waits allow.                                           |
| Heal on miss           | Re-resolves and **updates the entry in place** mid-run.                                                               | N/A. A broken selector is a test failure.                                                                    |

## How the multi-modal cache works

A cached step stores more than one way to find the target: where it sits on
screen, what it looks like, what text it contains, and the structural and
accessibility attributes around it. Which of those signals matters for a given
step is inferred from the natural-language description. "The red Cancel button
below the Order Summary header" leans on visual and positional signals; "the
Submit button in the form" leans on structure and role. When a step replays, the
runner checks the stored signals against the live page and runs the action
without invoking the LLM when there's a match.

## What happens on replay

The authoring loop only matters if the generated artifact survives the next day.
Take this generated Playwright spec the day after authoring, against the same
app with one change: the team replaced the static welcome string with a
personalized one (`"Welcome, Ada"` -> `"Hi Ada, welcome back"`).

**Playwright MCP, replay:**

1. `page.locator('input[type="email"]').fill(...)` and the subsequent
   interactions resolve normally.
2. `page.waitForTimeout(2000)` blocks for 2s regardless of whether the page is
   ready.
3. `expect(page.getByText("Welcome, Ada")).toBeVisible()` fails. The text was
   guessed from the snapshot the agent saw during authoring; it no longer
   matches.
4. The CI job fails. The maintainer either re-runs the MCP authoring loop from
   scratch or hand-edits the spec to use a different selector. Either way it's a
   code review.

**Momentic, replay:**

1. `type` / `click` steps hit the cache and run in milliseconds.
2. `assert: The dashboard chart is visible and not cut off` is evaluated by the
   assertion agent against the current page state. The agent reasons over the
   intent of the assertion, not a literal string match, so the rephrased welcome
   banner doesn't trip it.
3. The test passes. No code review needed.

Playwright MCP materializes locators and string literals at authoring time, so a
change to either is a test failure. Momentic resolves user-intent descriptions
at runtime, caches them for speed, and re-resolves them when the UI changes, so
the same change heals instead of failing.

<Accordion title="Why context grows for Playwright MCP">
  Each MCP tool call (`browser_click`, `browser_snapshot`, `browser_navigate`,
  ...) returns a structured snapshot of the page: rendered DOM, a11y tree, and
  console messages. The agent's prompt history accumulates every snapshot from
  every tool call in the session.

  Momentic's MCP server returns a compressed screenshot plus a short status from
  each preview / run call. Snapshot expansion happens server-side during locator
  resolution; the full DOM is only returned when the agent explicitly asks for the
  session state. Locator resolution runs against the cache first; cache hits
  return without invoking the LLM at all.
</Accordion>

## Authoring loop

|                       | Momentic                                                                        | Playwright MCP                                                                         |
| --------------------- | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
| Loop                  | Preview each step against the live page -> commit on success.                   | Interact through the browser from memory -> generate Playwright code at the end.       |
| Generated artifact    | `act` / `assert` / `extract` step targeting user intent.                        | `locator` / `getByRole` / `expect` / `waitForTimeout` materialized at generation time. |
| Editing a test        | Splice individual steps; browser session persists across edits.                 | Full browser and session reset; agent replays from memory.                             |
| Hidden / transient UI | Interacts with `aria-disabled`, 0-opacity, 0-bbox elements when user-reachable. | Often fails on Chakra-style hidden / transient elements.                               |
| Supported clients     | Cursor, Claude Code, Codex, any MCP client.                                     | VSCode, Claude Code.                                                                   |

<Accordion title="Agent rule for spec-driven development">
  Add a spec-driven section to `AGENTS.md` so the coding agent keeps Momentic
  tests in sync with feature work. See
  [the integration docs](/integrations/mcp-server#spec-driven-development).

  ```md theme={null}
  ## Momentic spec-driven development

  - Before starting any UI implementation, sketch the desired user flows as
    Momentic tests. Prefer `act` steps (AI action V3).
  - After any UI change, update the relevant Momentic tests so they describe the
    new behavior.
  ```
</Accordion>

## Generated artifact side-by-side

Playwright MCP (after running the flow once from memory):

```ts theme={null}
import { expect, test } from "@playwright/test";

test("sign in and verify", async ({ page }) => {
  await page.goto("https://app.example.com");
  await page.locator('input[type="email"]').fill("ada@example.com");
  await page.locator('input[type="password"]').fill("secret");
  await page.getByRole("button", { name: "Sign in" }).click();
  await page.waitForTimeout(2000); // hard-coded
  await expect(page.getByText("Welcome, Ada")).toBeVisible(); // text guessed from snapshot
});
```

The text was guessed from a stale snapshot; the `2000ms` wait was inserted
because the agent saw a transient loading state. Replay often fails on one or
both.

Agentic simplified format (each step previewed live before commit):

```yaml theme={null}
fileType: momentic/test/v2
id: sign-in-and-verify
url: https://app.example.com
steps:
  - act: Sign in with ada@example.com / secret
  - assert: The dashboard chart is visible and not cut off
```

Explicit simplified format (same flow, step-by-step):

```yaml theme={null}
fileType: momentic/test/v2
id: sign-in-and-verify
url: https://app.example.com
steps:
  - type:
      text: ada@example.com
      into: Email
  - type:
      text: secret
      into: Password
  - click: Sign in
  - assert: The dashboard chart is visible and not cut off
```

No hard-coded waits, no guessed text, no brittle selectors materialized at
generation time.

## A more realistic test

The hello-world above doesn't show the full simplified format surface. A
representative checkout regression with module reuse, parameter inputs, typed
extraction, and a conditional looks like this:

```yaml checkout.test.yaml theme={null}
fileType: momentic/test/v2
id: checkout-with-promo
url: https://shop.example.com
steps:
  - module:
      path: ../modules/sign-in.module.yaml
      inputs:
        EMAIL: env.QA_EMAIL
        PASSWORD: env.QA_PASSWORD
  - act: Add the Tetris Eye Sweatshirt (size M) to the cart
  - navigate: https://shop.example.com/checkout
  - type:
      text: "{{ env.PROMO_CODE }}"
      into: Promo code field
  - click: Apply
  - if:
      assert: A success banner saying the promo was applied is visible
      then:
        - extract:
            goal: The discounted subtotal in the order summary
            schema:
              type: object
              properties:
                amount:
                  type: number
              required: [amount]
  - if:
      assert: An invalid-promo error is visible
      then:
        - assert: The subtotal is unchanged
  - assertVisually: The order summary section is fully visible and not cut off
```

The matching module:

```yaml ../modules/sign-in.module.yaml theme={null}
fileType: momentic/module/v2
id: sign-in
name: Sign in
parameters:
  - name: EMAIL
  - name: PASSWORD
steps:
  - type:
      text: "{{ env.EMAIL }}"
      into: Email
  - type:
      text: "{{ env.PASSWORD }}"
      into: Password
  - click: Sign in
  - assert: The dashboard chart is visible and not cut off
```

## When to pick which

**Playwright MCP is the right call if** you have an existing Playwright codebase
you want to keep, your test suite is small and stable, you have a hard
requirement for OSS with no SaaS, and your agent surface is VSCode or Claude
Code only.

**Momentic is the right call if** coding agents are part of your authoring flow
at scale, you need generated tests to replay deterministically without
re-prompting, your product churns frequently enough that hard-coded waits and
text assertions break on a weekly basis, and you expect AI-native primitives +
auto-heal + recovery + a managed dashboard built in.

For the build-it-yourself version of this decision, see
[Build vs. buy](/comparisons/build-vs-buy).