Skip to content

Data Model

This page describes the core entities in Vega and how they relate to each other. Understanding this model is key to navigating the codebase, because almost every service, route, and database table maps back to one of these entities.

Entity hierarchy

erDiagram
    PROJECT {
        string id
        string name
        string owner_user_id
        timestamp created_at
    }
    REPOSITORY {
        string id
        string project_id
        string name
        string source_type
        string ingest_status
    }
    SNAPSHOT {
        string id
        string repository_id
        string storage_key
        timestamp created_at
    }
    THREAT_PROFILE {
        string id
        string repository_id
        json content
    }
    SCAN {
        string id
        string repository_id
        string snapshot_id
        string status
        timestamp created_at
        timestamp completed_at
    }
    EVENT {
        string id
        string scan_id
        string kind
        json payload
        timestamp created_at
    }
    FINDING {
        string id
        string scan_id
        string repository_id
        string severity
        string title
        string file_path
        string status
    }
    RUNNER_JOB {
        string id
        string scan_id
        string ecs_task_arn
        string status
    }

    PROJECT ||--o{ REPOSITORY : "contains"
    REPOSITORY ||--o{ SNAPSHOT : "has"
    REPOSITORY ||--o| THREAT_PROFILE : "has"
    REPOSITORY ||--o{ SCAN : "has"
    SNAPSHOT ||--o{ SCAN : "scanned by"
    SCAN ||--o{ EVENT : "produces"
    SCAN ||--o{ FINDING : "produces"
    SCAN ||--o| RUNNER_JOB : "tracked by"

Entity descriptions

Project

A project is the top-level workspace. Users create a project to group related repositories and security work together. Every repository, scan, and finding belongs to a project.

In the code: app/projects/models.py (Pydantic models), app/storage/postgres.py (DB operations), app/api/projects.py (HTTP routes).

Repository

A repository is the source code target. It can be added in two ways:

  • Git URL — the backend clones the repository
  • Zip upload — the user uploads an archive that the backend extracts

After a repository is added, it goes through an ingest process. The ingest_status field tracks this: pendingingestingready (or failed).

In the code: app/projects/service.py, app/projects/fetcher.py, app/api/repositories.py.

Snapshot

A snapshot is an immutable, point-in-time capture of a repository's source code. When the backend fetches or receives code, it stores a snapshot — either as a zip in S3 (production) or in a local directory (dev).

Scans always run against a specific snapshot. This means you can run multiple scans against the same snapshot, and the results are always tied to a known state of the code.

In the code: app/projects/service.py, app/storage/s3.py.

Threat profile

A threat profile describes what risks matter for a specific application. Before scanning, users edit a threat profile that the scan engine uses to focus its work. It typically includes:

  • What the application does (e.g., "handles payments and stores user PII")
  • Which vulnerability classes to prioritize (e.g., injection, auth bypass, data exposure)
  • Any special context about the architecture

In the code: app/projects/models.py, app/projects/service.py.

Scan

A scan is one audit run against a repository snapshot. It has a lifecycle:

queued → claimed → running → completed
                           → failed
                           → cancelled
  • queued — created, waiting for a worker
  • claimed — a worker has locked this scan to prevent double-execution
  • running — the runner task is actively scanning
  • completed — all findings and artifacts are persisted
  • failed — the runner encountered an unrecoverable error
  • cancelled — a user or operator stopped the scan

In the code: app/projects/service.py, app/projects/models.py, app/storage/postgres.py.

Event

Events are append-only records emitted during a scan. They are used to:

  1. Show live progress in the dashboard while a scan is running
  2. Provide a debug log after the scan finishes

Event kinds include: scan_started, scan_progress, scan_log, finding_updated, scan_completed, scan_failed, scan_cancelled.

Events are never modified after creation. If you need the current state of a finding, look at the Finding entity — not events.

In the code: app/events/models.py, app/events/service.py, app/projects/v16_adapter.py.

Finding

A finding is a specific security issue the scan engine discovered. Unlike events (which are a log), findings are structured records meant for human review and triage.

Each finding includes:

  • severitycritical, high, medium, low, or info
  • title — brief description of the issue
  • file_path — where in the code the issue was found
  • statusopen, confirmed, dismissed
  • evidence — the scan engine's reasoning

Findings are upserted (created or updated) as the scan engine runs. If the engine reports the same finding twice, the backend updates the existing record rather than creating a duplicate.

In the code: app/projects/service.py, app/projects/models.py, app/storage/postgres.py.

RunnerJob

A runner job tracks the ECS task that executed a scan. It stores the ECS task ARN so the worker can monitor task health, detect failures, and cancel running tasks. In local mode, there is no runner job — scans run in-process.

In the code: app/projects/runner.py, app/projects/service.py.


Persistence

In local development, all entities are stored as JSON files under data/. This makes it easy to inspect and reset state without a database.

In production, entities are stored in Postgres. The schema is defined in SQL migration files:

app/storage/migrations/
├── 001_initial_postgres.sql        ← projects, repositories, scans, findings
├── 002_scan_lifecycle_columns.sql  ← scan status transitions
├── 003_runner_jobs_idempotency.sql ← ECS task tracking
├── 004_findings_columns.sql        ← additional finding fields
└── 005_operational_state_columns.sql  ← worker heartbeats, stale scan recovery

Large objects (source code archives, scan reports, debug bundles) are always stored in S3, not in Postgres rows. The Postgres record just holds the S3 object key.