Service Roles

In production, Vega runs as five distinct service roles. Each role is a separate Docker container with its own entry point, permissions, and scaling behavior. This separation means a failing scan doesn't crash the API, and a credential leak in one role doesn't expose others.

vega-api

Entry point: uvicorn app.main:app
Dockerfile: docker/api/Dockerfile

The API is the only role that handles user-facing HTTP traffic. It owns:

Authentication (login, token refresh, Cognito JWT validation)
Project and repository management
Scan creation and cancellation
Reading findings and events
Health and operations endpoints

The API never runs the scan itself in production. When a scan is created with VEGA_SCAN_EXECUTION_MODE=sqs, the API writes the scan record to Postgres and sends a message to SQS. It then returns immediately. The actual scanning happens in the worker and runner.

Run locally:

uvicorn app.main:app --reload --reload-dir app

vega-worker

Entry point: python scripts/run-scan-worker.py
Dockerfile: docker/worker/Dockerfile

The worker is a long-running process that loops continuously:

Checks heartbeat and recovers stale scans
Reads scan messages from SQS
Claims the scan in Postgres (row lock — prevents double-execution)
Either runs the scan locally (dev) or launches a vega-v16-runner ECS task (production)

The worker has Docker access in its container image so it can issue docker run commands for the local Codex runner when VEGA_SCAN_WORKER_EXECUTION_MODE=local.

Run locally (for external mode):

VEGA_SCAN_EXECUTION_MODE=external python scripts/run-scan-worker.py

vega-v16-runner

Entry point: python scripts/run-scan-runner.py
Dockerfile: docker/v16-runner/Dockerfile

The runner is designed to run exactly one scan and exit. It is launched by the worker via ECS RunTask. After the scan finishes (success, failure, or cancellation), the ECS task stops and is billed only for the time it ran.

The runner:

Loads the scan record from Postgres
Downloads the source snapshot from S3
Calls ProjectService.run_claimed_scan_by_id()
Which calls V16ServiceAdapter.scan_source()
Which runs the v16 scan engine (planning + per-component audits)
Writes events and findings to Postgres
Uploads artifacts to S3 (v16-events.jsonl, runner-summary.json, v16-report.json, v16-debug-bundle.zip)

The runner image includes Node.js and the codex npm package because v16 invokes Codex as a subprocess.

Run locally (to re-execute a specific scan by ID):

python scripts/run-scan-runner.py <scan-id>

vega-llm-proxy

Entry point: uvicorn app.llm_proxy.main:app
Dockerfile: docker/llm-proxy/Dockerfile

The LLM proxy is a small FastAPI service that sits between scan runners and the AI provider. Its purpose is credential isolation: runners receive a short-lived, scan-scoped token, and the proxy holds the real provider API key.

The proxy:

Validates the scan-scoped token on every request
Forwards the request to the configured provider
Tracks token and cost usage per scan
Rejects requests that exceed per-scan limits

Run locally (only needed if you want proxy-mediated AI calls):

uvicorn app.llm_proxy.main:app --port 8001

vega-maintenance

Entry point: python scripts/run-maintenance.py --once
Dockerfile: docker/maintenance/Dockerfile

The maintenance role is not a long-running service — it's a task definition you run on demand. It's used for:

Database migrations: scripts/run-db-migrations.py
Cleanup jobs: removing stale artifacts, orphaned sessions, etc.

In AWS, you run the maintenance task with scripts/aws/run-migrations.sh, which triggers the ECS task and waits for it to complete.

Why separate roles?

Without role separation	With role separation
Scan crash kills the API	API stays up; scan logs to CloudWatch
Provider key in every container	Provider key only in the proxy
Can't scale scan workers independently	Workers scale separately from the API
Migrations run during startup	Migrations run on-demand before deploy
One log stream for everything	Per-service log groups for easy debugging