Data Model
This page describes the core entities in Vega and how they relate to each other. Understanding this model is key to navigating the codebase, because almost every service, route, and database table maps back to one of these entities.
Entity hierarchy
erDiagram
PROJECT {
string id
string name
string owner_user_id
timestamp created_at
}
REPOSITORY {
string id
string project_id
string name
string source_type
string ingest_status
}
SNAPSHOT {
string id
string repository_id
string storage_key
timestamp created_at
}
THREAT_PROFILE {
string id
string repository_id
json content
}
SCAN {
string id
string repository_id
string snapshot_id
string status
timestamp created_at
timestamp completed_at
}
EVENT {
string id
string scan_id
string kind
json payload
timestamp created_at
}
FINDING {
string id
string scan_id
string repository_id
string severity
string title
string file_path
string status
}
RUNNER_JOB {
string id
string scan_id
string ecs_task_arn
string status
}
PROJECT ||--o{ REPOSITORY : "contains"
REPOSITORY ||--o{ SNAPSHOT : "has"
REPOSITORY ||--o| THREAT_PROFILE : "has"
REPOSITORY ||--o{ SCAN : "has"
SNAPSHOT ||--o{ SCAN : "scanned by"
SCAN ||--o{ EVENT : "produces"
SCAN ||--o{ FINDING : "produces"
SCAN ||--o| RUNNER_JOB : "tracked by"
Entity descriptions
Project
A project is the top-level workspace. Users create a project to group related repositories and security work together. Every repository, scan, and finding belongs to a project.
In the code: app/projects/models.py (Pydantic models), app/storage/postgres.py (DB operations), app/api/projects.py (HTTP routes).
Repository
A repository is the source code target. It can be added in two ways:
- Git URL — the backend clones the repository
- Zip upload — the user uploads an archive that the backend extracts
After a repository is added, it goes through an ingest process. The ingest_status field tracks this: pending → ingesting → ready (or failed).
In the code: app/projects/service.py, app/projects/fetcher.py, app/api/repositories.py.
Snapshot
A snapshot is an immutable, point-in-time capture of a repository's source code. When the backend fetches or receives code, it stores a snapshot — either as a zip in S3 (production) or in a local directory (dev).
Scans always run against a specific snapshot. This means you can run multiple scans against the same snapshot, and the results are always tied to a known state of the code.
In the code: app/projects/service.py, app/storage/s3.py.
Threat profile
A threat profile describes what risks matter for a specific application. Before scanning, users edit a threat profile that the scan engine uses to focus its work. It typically includes:
- What the application does (e.g., "handles payments and stores user PII")
- Which vulnerability classes to prioritize (e.g., injection, auth bypass, data exposure)
- Any special context about the architecture
In the code: app/projects/models.py, app/projects/service.py.
Scan
A scan is one audit run against a repository snapshot. It has a lifecycle:
queued → claimed → running → completed
→ failed
→ cancelled
- queued — created, waiting for a worker
- claimed — a worker has locked this scan to prevent double-execution
- running — the runner task is actively scanning
- completed — all findings and artifacts are persisted
- failed — the runner encountered an unrecoverable error
- cancelled — a user or operator stopped the scan
In the code: app/projects/service.py, app/projects/models.py, app/storage/postgres.py.
Event
Events are append-only records emitted during a scan. They are used to:
- Show live progress in the dashboard while a scan is running
- Provide a debug log after the scan finishes
Event kinds include: scan_started, scan_progress, scan_log, finding_updated, scan_completed, scan_failed, scan_cancelled.
Events are never modified after creation. If you need the current state of a finding, look at the Finding entity — not events.
In the code: app/events/models.py, app/events/service.py, app/projects/v16_adapter.py.
Finding
A finding is a specific security issue the scan engine discovered. Unlike events (which are a log), findings are structured records meant for human review and triage.
Each finding includes:
- severity —
critical,high,medium,low, orinfo - title — brief description of the issue
- file_path — where in the code the issue was found
- status —
open,confirmed,dismissed - evidence — the scan engine's reasoning
Findings are upserted (created or updated) as the scan engine runs. If the engine reports the same finding twice, the backend updates the existing record rather than creating a duplicate.
In the code: app/projects/service.py, app/projects/models.py, app/storage/postgres.py.
RunnerJob
A runner job tracks the ECS task that executed a scan. It stores the ECS task ARN so the worker can monitor task health, detect failures, and cancel running tasks. In local mode, there is no runner job — scans run in-process.
In the code: app/projects/runner.py, app/projects/service.py.
Persistence
In local development, all entities are stored as JSON files under data/. This makes it easy to inspect and reset state without a database.
In production, entities are stored in Postgres. The schema is defined in SQL migration files:
app/storage/migrations/
├── 001_initial_postgres.sql ← projects, repositories, scans, findings
├── 002_scan_lifecycle_columns.sql ← scan status transitions
├── 003_runner_jobs_idempotency.sql ← ECS task tracking
├── 004_findings_columns.sql ← additional finding fields
└── 005_operational_state_columns.sql ← worker heartbeats, stale scan recovery
Large objects (source code archives, scan reports, debug bundles) are always stored in S3, not in Postgres rows. The Postgres record just holds the S3 object key.