Skip to content

How It Works

teststop has one job: trigger AI to test like an adversarial user. This page explains every step of how it does that.


The Full Pipeline

flowchart TD
    A[teststop run] --> B[reader.ScanProject]
    B --> C[memory.Load]
    C --> D[mandate.Compose]
    D --> E{sandbox.Detect}
    E -->|container running| F[AI CLI in VM]
    E -->|no container| G[AI CLI direct]
    F --> H[GenerateScenarios]
    G --> H
    H --> I[memory.Update per area]
    I --> J[memory.RetireEligible]
    J --> K[memory.Save]
    K --> L[reporter.Output]
    L --> M[os.Exit code]

Step 1 — Scan (internal/reader/)

teststop reads your project without executing any code.

What it detects

The scanner (scanner.go) walks the project tree, skipping: .git, node_modules, vendor, .teststop, dist, build, etc.

The detector (detector.go) identifies:

  • Language — most files wins; Go beats Python beats TypeScript in ties
  • System type — checked in priority order: mobile_appdata_pipelinecliweb_appapilibraryunknown

The analyzer (analyzer.go) extracts:

  • Routes & flows — HTTP handlers, Cobra Use: strings, @app.route, Express app.METHOD
  • Dependenciesgo.mod, package.json, requirements.txt, Gemfile, Cargo.toml
  • Entry pointsmain.go, index.*, app.*, server.*

All extracted data populates a ProjectContext struct that feeds the mandate.


Step 2 — Load Memory (internal/memory/)

teststop reads .teststop/memory.json from the project directory.

If the file doesn't exist (first run), it starts with empty memory — all areas at 0% confidence.

The memory tells the mandate composer which areas to focus on:

  • Volatile areas (< 75% confidence) → test aggressively
  • Stable areas (≥ 95% confidence) → reduce coverage
  • Retired areas → skip entirely

Step 3 — Compose Mandate (internal/mandate/)

The mandate is the adversarial instruction that transforms a general-purpose AI into a user who tries to break your system.

The composer takes the base mandate (mandate/base.md) and replaces nine tokens:

Token Replaced With
[SYSTEM_NAME] Detected project name
[PROJECT_NAME] Project directory name
[DETECTED_LANGUAGE] Primary language
[DETECTED_TYPE] System type (api, cli, web_app, …)
[DETECTED_ENTRY_POINTS] Entry point files
[DETECTED_FLOWS] Extracted routes and flows
[MEMORY_STABLE_AREAS] Areas with confidence ≥ 95%
[MEMORY_VOLATILE_AREAS] Areas with confidence < 75%
[N] Scenario count (3–9 based on depth × system type)

Scenario count table

Depth light normal aggressive
cli / library 3 5 7
api / web_app 4 6 9
data_pipeline / mobile 3 5 8

See the mandate deep-dive for how the adversarial patterns work.


Step 4 — Sandbox Detection (internal/sandbox/)

teststop checks whether Apple Container is available:

container system status  # must return "running"
TESTSTOP_SANDBOX Behavior
auto (default) Use container if running; fall back to direct
required Error if container not available
none Always run AI CLI directly (CI, Linux, Docker)

In container mode, credentials are mounted read-only:

~/.claude       → /root/.claude:ro
~/.config/gh    → /root/.config/gh:ro

The AI cannot access anything else on your host.


Step 5 — Generate Scenarios (internal/ai/)

teststop shells out to the AI CLI with the composed mandate:

# Claude
claude -p "<mandate text>"

# Copilot
copilot -p "<mandate text>" -s --no-ask-user

The AI returns a JSON array of Scenario objects. teststop strips any markdown fences and parses the JSON.

The AI has a 5-minute timeout per run.


Step 6 — Update Memory

For each scenario with a confidence_area field, teststop updates that area's confidence score.

In v0.1, all generated scenarios count as passes (the execution engine arrives in v0.2). The confidence update formula is:

new_confidence = old + 0.19 × (1.0 - old)

This is an exponential approach to 1.0 — requiring ~15 passes to reach the retirement threshold of 0.95.


Step 7 — Retire Eligible Areas

Areas meeting both conditions are retired:

  • Confidence ≥ 0.95 (95%)
  • Test count ≥ 15

Retired areas are written to .teststop/retired.json and removed from active testing. They stay in memory.json as a historical record, marked "retired": true.


Step 8 — Report & Exit

teststop writes output in the requested format and exits with a structured exit code:

Exit Code Meaning
0 Confidence threshold met — safe to proceed
1 Below threshold — review required
2 Critical failures — do NOT deploy
3 teststop internal error

A markdown report is always saved to .teststop/reports/YYYY-MM-DD-HH-MM-SS.md regardless of --output.


Design Principles

No code execution

teststop never runs your code. All scanning is static. This is intentional — it keeps teststop universal and safe to run in any environment.

No new configuration

teststop run must work with zero setup. Adding project-specific config defeats the purpose.

Self-reducing

The test surface shrinks as confidence builds. teststop is designed to be needed less over time, not more.

Agent-native

Every output (JSON, exit codes, structured fields) is designed to be consumed by AI coding agents, not just humans.