Scan Lifecycle

A scan is the central operation of Vega. This page traces the complete journey from the moment a user clicks "Start scan" to when findings appear in the dashboard. The exact path differs between local mode and AWS production mode.

Complete scan flow

flowchart TD
    A["User clicks 'Start scan' in dashboard"]
    B["POST /v1/repositories/:id/scans"]
    C["API writes scan row to DB\nstatus = queued"]

    A --> B --> C

    C --> D{Execution mode?}

    D -->|thread| E["API runs scan in a\nbackground thread"]
    D -->|external| F["Worker polling loop\nclaims the scan"]
    D -->|sqs| G["API sends message to SQS"]
    G --> F

    F --> H["Worker locks scan row in DB\nstatus = claimed"]
    H --> I{Worker execution mode?}
    I -->|local| E
    I -->|ecs| J["Worker calls ECS RunTask\nlaunches vega-v16-runner"]

    E --> K
    J --> K

    K["Runner loads source snapshot from S3\nor local directory"]
    K --> L["Runner calls ProjectService\n.run_claimed_scan_by_id()"]
    L --> M["Calls V16ServiceAdapter\n.scan_source()"]
    M --> N["v16 runs planning step\nCodex identifies components to audit"]
    N --> O["v16 runs audit per component\nCodex analyzes each one"]

    O --> P["v16 emits V16Events:\nscan_progress, finding_updated,\nscan_completed, etc."]
    P --> Q["Backend adapter maps events\nto Vega event records"]
    Q --> R["ProjectService upserts findings\nin Postgres or JSON"]
    Q --> S["Events appended to event store"]
    R --> T["Runner uploads artifacts to S3:\nv16-events.jsonl, runner-summary.json,\nv16-report.json, v16-debug-bundle.zip"]
    T --> U["Scan status → completed\n(or failed / cancelled)"]

    U --> V["Dashboard polls /v1/repositories/:id/scans\nand reads findings"]

Step-by-step explanation

1. Scan is created

The user initiates a scan from the dashboard or via API. The backend creates a scan record in the database with status queued. If the execution mode is sqs, a message is also sent to the SQS queue. The API returns immediately — it doesn't wait for the scan to finish.

2. Worker claims the scan

A worker process (either running locally or as an ECS service) polls SQS for scan messages. When it receives one, it claims the scan using a database row lock. This is crucial: claiming prevents two workers from picking up the same scan simultaneously. The scan status moves from queued to claimed.

3. Runner is launched

If VEGA_SCAN_WORKER_EXECUTION_MODE=ecs, the worker calls AWS ECS RunTask to start a vega-v16-runner container. The scan ID is passed as an environment variable. In local mode, the scan runs in the worker process itself.

4. Source is loaded

The runner loads the source snapshot. In production, this means downloading a zip from S3. Locally, it reads from a directory under data/snapshots/.

5. v16 planning

The v16 scan engine first runs a planning step. It uses Codex (an AI CLI tool) to analyze the codebase structure and the threat profile, then produces a plan: which components of the code to audit and in what order.

6. Per-component auditing

For each component in the plan, v16 runs an audit step. Codex reads the source files for that component and generates findings based on the threat profile. All Codex calls go through the LLM proxy, which enforces per-scan usage limits.

7. Events and findings stream back

As v16 works, it emits events:

Event kind	Meaning
`scan_started`	v16 has begun
`scan_progress`	Progress update (e.g., "auditing component 3 of 8")
`scan_log`	Debug log line from v16 or Codex
`finding_updated`	A security issue was found or updated
`scan_completed`	v16 finished successfully
`scan_failed`	v16 encountered an unrecoverable error
`scan_cancelled`	The scan was cancelled

The backend adapter (app/projects/v16_adapter.py) maps these into Vega event records and calls back into ProjectService to upsert findings.

8. Artifacts are written

When the scan finishes, the runner uploads artifacts to S3:

v16-events.jsonl — all raw v16 events
runner-summary.json — scan metadata and timing
v16-report.json — structured findings report
v16-debug-bundle.zip — full debug bundle including Codex state (for diagnosing scan failures)

9. Dashboard updates

The frontend polls the scan status endpoint and the findings endpoint. Once the scan is completed, all findings are available for review.

Execution mode quick reference

Local (simplest):
  VEGA_SCAN_EXECUTION_MODE=thread
  → API process runs the scan directly

Local (production-like):
  VEGA_SCAN_EXECUTION_MODE=external
  VEGA_SCAN_WORKER_EXECUTION_MODE=local
  → Separate worker process, but no ECS

Production:
  VEGA_SCAN_EXECUTION_MODE=sqs
  VEGA_SCAN_WORKER_EXECUTION_MODE=ecs
  → SQS queue, ECS worker, isolated ECS runner per scan

Cancellation

A user can cancel a running scan. The cancellation flow:

DELETE /v1/repositories/:id/scans/:scanId is called
API sets a cancellation flag on the scan record
If an ECS runner task is running, the worker stops it via ECS StopTask
v16 and Codex respect the cancellation signal cooperatively
Partial findings and events up to the cancellation point are preserved
Scan status moves to cancelled

Debugging stale scans

A scan that stays running for longer than VEGA_SCAN_RUNNING_STALE_SECONDS (default: 6 hours) is treated as stale. The worker's recovery loop will detect it and mark it failed with an appropriate error event. This handles cases where the runner task crashed silently without writing a failure event.