Service Roles
In production, Vega runs as five distinct service roles. Each role is a separate Docker container with its own entry point, permissions, and scaling behavior. This separation means a failing scan doesn't crash the API, and a credential leak in one role doesn't expose others.
vega-api
Entry point: uvicorn app.main:app
Dockerfile: docker/api/Dockerfile
The API is the only role that handles user-facing HTTP traffic. It owns:
- Authentication (login, token refresh, Cognito JWT validation)
- Project and repository management
- Scan creation and cancellation
- Reading findings and events
- Health and operations endpoints
The API never runs the scan itself in production. When a scan is created with VEGA_SCAN_EXECUTION_MODE=sqs, the API writes the scan record to Postgres and sends a message to SQS. It then returns immediately. The actual scanning happens in the worker and runner.
Run locally:
uvicorn app.main:app --reload --reload-dir app
vega-worker
Entry point: python scripts/run-scan-worker.py
Dockerfile: docker/worker/Dockerfile
The worker is a long-running process that loops continuously:
- Checks heartbeat and recovers stale scans
- Reads scan messages from SQS
- Claims the scan in Postgres (row lock — prevents double-execution)
- Either runs the scan locally (dev) or launches a
vega-v16-runnerECS task (production)
The worker has Docker access in its container image so it can issue docker run commands for the local Codex runner when VEGA_SCAN_WORKER_EXECUTION_MODE=local.
Run locally (for external mode):
VEGA_SCAN_EXECUTION_MODE=external python scripts/run-scan-worker.py
vega-v16-runner
Entry point: python scripts/run-scan-runner.py
Dockerfile: docker/v16-runner/Dockerfile
The runner is designed to run exactly one scan and exit. It is launched by the worker via ECS RunTask. After the scan finishes (success, failure, or cancellation), the ECS task stops and is billed only for the time it ran.
The runner:
- Loads the scan record from Postgres
- Downloads the source snapshot from S3
- Calls
ProjectService.run_claimed_scan_by_id() - Which calls
V16ServiceAdapter.scan_source() - Which runs the v16 scan engine (planning + per-component audits)
- Writes events and findings to Postgres
- Uploads artifacts to S3 (
v16-events.jsonl,runner-summary.json,v16-report.json,v16-debug-bundle.zip)
The runner image includes Node.js and the codex npm package because v16 invokes Codex as a subprocess.
Run locally (to re-execute a specific scan by ID):
python scripts/run-scan-runner.py <scan-id>
vega-llm-proxy
Entry point: uvicorn app.llm_proxy.main:app
Dockerfile: docker/llm-proxy/Dockerfile
The LLM proxy is a small FastAPI service that sits between scan runners and the AI provider. Its purpose is credential isolation: runners receive a short-lived, scan-scoped token, and the proxy holds the real provider API key.
The proxy:
- Validates the scan-scoped token on every request
- Forwards the request to the configured provider
- Tracks token and cost usage per scan
- Rejects requests that exceed per-scan limits
Run locally (only needed if you want proxy-mediated AI calls):
uvicorn app.llm_proxy.main:app --port 8001
vega-maintenance
Entry point: python scripts/run-maintenance.py --once
Dockerfile: docker/maintenance/Dockerfile
The maintenance role is not a long-running service — it's a task definition you run on demand. It's used for:
- Database migrations:
scripts/run-db-migrations.py - Cleanup jobs: removing stale artifacts, orphaned sessions, etc.
In AWS, you run the maintenance task with scripts/aws/run-migrations.sh, which triggers the ECS task and waits for it to complete.
Why separate roles?
| Without role separation | With role separation |
|---|---|
| Scan crash kills the API | API stays up; scan logs to CloudWatch |
| Provider key in every container | Provider key only in the proxy |
| Can't scale scan workers independently | Workers scale separately from the API |
| Migrations run during startup | Migrations run on-demand before deploy |
| One log stream for everything | Per-service log groups for easy debugging |