Glossary

Quick definitions for terms used throughout the codebase and documentation.

Product terms

Project (WorkspaceProject)
The top-level workspace container. A project holds one or more repositories, all their scans, and all the findings those scans produced. In the code: domain/projects/records.py.

Repository
A source code target inside a project. Users add a repository by providing a Git URL, uploading a zip archive, or connecting a GitHub App repository. The backend runs an ingest pipeline to fetch the source, stores an immutable snapshot, and associates scans and findings with this repository.

SourceSnapshot
An immutable capture of a repository's source code at a specific point in time. Scans run against snapshots, not live Git branches, so the scan result is always tied to a known state of the code. Includes file tree, size metrics, and a storage reference. In the code: domain/ingest/snapshots.py.

IngestJob
Tracks the lifecycle of a single repository ingest operation (cloning, extracting, snapshotting). Created when a repository is added. Goes through states: queued → claimed → running → completed / failed / cancelled / stale. In the code: domain/ingest/records.py.

Scan (ScanRecord)
A single audit run over selected paths in a repository snapshot. A scan goes through states: queued → running → completed (or failed / cancelled / paused). In the code: domain/scans/records.py.

Finding (FindingRecord)
A specific security issue discovered during a scan. Each finding has a severity, title, file locations, triage status, and optional verification status. Findings are durable records — they survive after the scan finishes. In the code: domain/findings/records.py.

Artifact (ArtifactRecord)
A file produced by a scan (stored in S3). Artifacts include the event log JSONL, debug bundle, generated threat model, activity log, and worker components log. In the code: domain/artifacts/records.py.

Planning artifact
A cached bundle produced at the end of the planning stage. Contains the component plan and the generated threat model. Later scans against the same snapshot can reuse a planning artifact to skip re-running the (expensive) planning step.

Domain event (DomainEvent)
An append-only record emitted during a scan — progress updates, log lines, warnings, and finding notifications. The frontend uses events to show live scan progress. Stored in the domain_events table. In the code: domain/events/records.py.

Architecture terms

Hexagonal architecture
The structural pattern used by the backend. Business logic lives in the application/ layer and depends only on ports/ interfaces. adapters/ provide concrete implementations (Postgres, S3, SQS, etc.). The composition/wiring/ layer selects and wires implementations at startup.

Port
A Python Protocol interface in app/ports/. Defines what the application layer can do with a dependency (e.g., ScanStore, ScanQueuePort) without specifying how it's implemented. In the code: app/ports/.

Adapter
A concrete implementation of a port. For example, PostgresRecordStore implements ScanStore for production; JsonRecordStore implements it for local development. In the code: app/adapters/.

Use case
A class in app/application/ that encapsulates one business operation. Takes a *Command dataclass as input and returns a *Result dataclass. All business logic, state transitions, and validation live here. Examples: CreateScanUseCase, NormalizeAndUpsertFindingsUseCase.

RuntimeContainer
The wired-together set of use cases and adapters for a specific process role. Built at startup by build_*_runtime() in composition/wiring/ and injected into route handlers via Depends(get_runtime_container).

RuntimeSettings
The Pydantic settings class (composition/settings/) that reads all VEGA_* environment variables. Supports profile defaults (local, test, staging, production) and optional values from AWS Secrets Manager.

DomainModel
The Pydantic v2 base class for all domain records. Records are serialized to JSONB for storage. In the code: domain/base.py.

GenericRecordStore
A document store built on a single generic_records table, used for auxiliary entities like users, API keys, billing records, and GitHub connections. Keyed by (record_type, record_key).

Service terms

vega-api
The FastAPI HTTP service. Handles all user-facing requests: authentication, project/repository management, scan creation, finding triage, billing, GitHub integration.

vega-scan-worker
A long-running background process that polls the scan queue (SQS or local) for scan jobs. Claims jobs, then either executes them in-process or launches vega-scan-runner ECS tasks.

vega-scan-runner
An ephemeral container that runs exactly one scan phase (plan/audit/verify) and exits. Downloads the source snapshot, runs the vega-core scan engine, writes findings and events to Postgres, uploads artifacts to S3.

vega-repo-ingest-worker
A long-running background process that polls the ingest queue for repository ingest jobs. Claims jobs, then either executes them in-process or launches vega-repo-ingest-runner ECS tasks.

vega-repo-ingest-runner
An ephemeral container that clones or extracts one repository and creates a SourceSnapshot, then exits.

vega-llm-proxy
An internal proxy between scan runners and the AI provider. Runners never hold the raw provider API key — they get a short-lived scan-scoped token from the proxy. The proxy enforces per-scan usage limits.

vega-maintenance
A task definition for one-off jobs: running database migrations, cleaning up stale artifacts, reconciling stuck scans.

vega-core
The scan engine submodule. It takes a source root, plans which parts of the code to audit, runs Codex (an AI CLI tool) on each component, and streams events back to the backend via EngineEventSink.

Infrastructure terms (AWS)

VPC (Virtual Private Cloud)
A private network inside AWS. All Vega services run inside a VPC.

ECS (Elastic Container Service)
AWS's container orchestration service. Vega runs all backend services as Docker containers managed by ECS.

Fargate
The serverless compute runtime for ECS. You don't manage EC2 instances — AWS handles the infrastructure.

ECR (Elastic Container Registry)
AWS's Docker image registry. Vega builds Docker images and pushes them to ECR.

SQS (Simple Queue Service)
AWS's managed message queue. The API sends scan and ingest job messages to SQS. Workers consume messages to claim and execute jobs.

RDS / Aurora Postgres
AWS's managed relational database service. Vega uses Postgres with JSONB document storage.

S3 (Simple Storage Service)
AWS's object storage. Vega uses S3 buckets for source snapshots, scan artifacts, and the built frontend files.

CloudWatch
AWS's logging, metrics, and alerting service. Every ECS container logs to a CloudWatch log group.

Secrets Manager
AWS's encrypted secret storage. Database credentials, API keys, and provider keys are stored in Secrets Manager and injected via VEGA_SECRETS_ARN.

Cognito
AWS's managed user authentication service. Users sign in through the frontend via Cognito; the backend validates the resulting JWT.

CloudFront
AWS's CDN. Serves the frontend static files from S3 and routes API calls to the backend load balancer.

Terraform
Infrastructure-as-code tool. AWS resources are defined in .tf files under infra/terraform/ rather than through the console.

STS (Security Token Service)
AWS service used to generate temporary, scoped credentials. Vega uses STS to issue scan-scoped S3 credentials (ScanScopedS3Credentials) so each scan runner can only access its own artifacts.