Core Architecture

Vega Backend is structured as a set of cooperating processes, each with a narrow responsibility. Understanding how they fit together makes it much easier to debug issues, extend behavior, or deploy the system.

The big picture

flowchart TD
    subgraph user_facing["User-facing"]
        FE[React dashboard]
        API[vega-api\nFastAPI on port 8000]
    end

    subgraph scan_pipeline["Scan pipeline"]
        Worker[vega-worker\nSQS consumer]
        Runner[vega-v16-runner\none per scan]
        V16[v16 adapter\nPlanning + auditing]
        Codex[Codex CLI\nAI code analysis]
        Proxy[vega-llm-proxy\nUsage enforcement]
    end

    subgraph storage["Storage"]
        PG[(Postgres)]
        S3[(S3)]
        SQS[SQS queue]
    end

    FE --> API
    API --> PG
    API --> S3
    API --> SQS

    SQS --> Worker
    Worker -->|launch ECS task| Runner
    Runner --> PG
    Runner --> S3
    Runner --> V16
    V16 --> Codex
    Codex --> Proxy
    Proxy --> Provider[AI provider]
    V16 -->|events + findings| Runner

The domain model

Everything in Vega is organized around a hierarchy of entities:

Project
└── Repository
    ├── Snapshot  (immutable source capture)
    ├── ThreatProfile  (what to look for)
    └── Scan
        ├── Event[]  (live progress log)
        ├── Finding[]  (security issues)
        └── RunnerJob  (ECS task record)

A Project is a workspace — like a folder that groups related repositories and security work. A Repository is the source code target. When you add a repository, Vega fetches the code and stores an immutable Snapshot of it. Before scanning, you define a ThreatProfile — a description of what risks matter for this application. A Scan is one audit run against a snapshot, and it produces Events (live progress) and Findings (security issues).

See Data Model for the full entity reference.

Process separation

Vega separates concerns across five named service roles:

Role	What it does
vega-api	Handles all HTTP requests. Never runs scans directly (in production).
vega-worker	Watches SQS for scan jobs, claims them in Postgres, launches runners.
vega-v16-runner	Runs exactly one scan and exits. Downloads source, invokes v16, writes results.
vega-llm-proxy	Proxies AI requests with per-scan limits. Holds the provider credential.
vega-maintenance	Runs one-off jobs: migrations, cleanup, etc.

This separation means scan failures don't crash the API, runaway AI spend is capped per scan, and sensitive credentials are isolated.

Two runtime shapes

Vega intentionally supports two ways of running:

Local developmentAWS production

Everything can run on one machine with no cloud dependencies. State lives in JSON files. Scans run inside the API process (thread mode). The scan engine runs Codex inside a local Docker container.

# One terminal for the API
uvicorn app.main:app --reload --reload-dir app

# One terminal for the frontend
cd frontend && npm run dev

Everything is separated and durable. State lives in Postgres and S3. Scans are queued via SQS, claimed by an ECS worker, and executed in isolated ECS runner tasks. The LLM proxy handles provider credentials.

CloudFront → ALB → vega-api (ECS)
                       ↓
                     SQS ← scan created
                       ↓
                vega-worker (ECS) ← polls SQS
                       ↓
           vega-v16-runner (ECS RunTask) ← one per scan

Code organization

The codebase is organized into clean layers:

HTTP layer (app/api/) — Route handlers. They validate input, call services, and return response models. No business logic lives here.

Domain layer (app/projects/, app/auth/, etc.) — Services that own business rules and state transitions. app/projects/service.py is by far the largest and most important.

Storage layer (app/storage/) — Abstractions for Postgres, S3, and archive handling. Application code calls these, never raw database queries.

Engine boundary (v16/, app/projects/v16_adapter.py) — The scan engine is a separate submodule. The backend communicates with it only through the adapter, keeping engine internals hidden from API code.

Configuration (app/core/settings.py) — A single Pydantic settings class. Every field is overridable with a VEGA_* environment variable.

Next steps

Data Model — Entity relationships and field descriptions
Service Roles — What each process does in detail
Scan Lifecycle — Step-by-step scan execution from request to findings
Local vs AWS Runtime — Side-by-side configuration comparison