Skip to content

Vega on AWS

This page explains how Vega works when deployed to AWS — what each AWS service does, how the pieces talk to each other, and the exact flow of a scan from start to finish.

Architecture diagram

flowchart TD
    User[Browser user]

    subgraph edge["Edge"]
        CF[CloudFront CDN]
        S3FE[S3: frontend bucket\nstatic React files]
    end

    subgraph vpc["VPC — private network"]
        ALB[Application Load Balancer\npublic subnet]

        subgraph app["Application containers — ECS Fargate"]
            API[vega-api\nprivate subnet]
            Worker[vega-worker\nprivate subnet]
            Proxy[vega-llm-proxy\nprivate subnet]
        end

        subgraph runner["Scan execution — ECS RunTask"]
            Runner[vega-v16-runner\nprivate subnet, one per scan]
        end

        RDS[(RDS Postgres\nprivate subnet)]
    end

    subgraph aws_services["AWS managed services"]
        SQS[SQS scan queue]
        S3SRC[S3: source bucket]
        S3ART[S3: artifacts bucket]
        Cognito[Cognito user pool]
        SM[Secrets Manager]
        CW[CloudWatch logs]
        ECR[ECR image registry]
    end

    Provider[AI provider API]

    User --> CF
    CF --> S3FE
    CF --> ALB
    ALB --> API
    API --> Cognito
    API --> RDS
    API --> S3SRC
    API --> SQS

    SQS --> Worker
    Worker --> RDS
    Worker -->|RunTask| Runner
    Runner --> RDS
    Runner --> S3SRC
    Runner --> S3ART
    Runner --> Proxy
    Proxy --> Provider

    API --> CW
    Worker --> CW
    Runner --> CW
    Proxy --> CW
    API --> SM
    Worker --> SM
    ECS -->|pull images| ECR

Edge and frontend

CloudFront is AWS's CDN (Content Delivery Network). When a user opens the Vega dashboard, their browser connects to the nearest CloudFront edge location — not directly to AWS. CloudFront serves two things:

  1. Static frontend files — the built React app stored in an S3 bucket. S3 is AWS's object storage; it holds files but isn't a web server. CloudFront makes S3 act like one, adding caching and HTTPS.

  2. API requests — requests to /v1/* are forwarded by CloudFront to the Application Load Balancer, which routes them to the vega-api ECS service.

Authentication with Cognito

AWS Cognito is a managed user authentication service. Vega uses it so the team doesn't have to build user management from scratch. Here's how it works:

  1. The user enters their credentials on the dashboard login page.
  2. The frontend calls Cognito directly (SRP authentication protocol) and receives JWT tokens.
  3. For every subsequent API request, the frontend sends Authorization: Bearer <token>.
  4. The vega-api validates the JWT signature using Cognito's public JWKS (JSON Web Key Set) endpoint.

Relevant code: app/auth/cognito.py, app/auth/service.py.
Terraform module: infra/terraform/modules/cognito/main.tf.

Network isolation with VPC

A VPC (Virtual Private Cloud) is a private network inside AWS. All Vega services run inside the VPC. The database, worker, runner, and LLM proxy are in private subnets — they have no direct internet access. The API's load balancer is in a public subnet so it can receive traffic from CloudFront.

Security groups act as firewalls. For example: - The database security group only accepts connections from the API and runner task security groups. - The LLM proxy security group only accepts connections from runner tasks. - The worker security group only needs outbound access to SQS, RDS, and ECS.

Relevant Terraform modules: infra/terraform/modules/network/main.tf, infra/terraform/modules/security/main.tf.

Compute with ECS Fargate

ECS (Elastic Container Service) runs Docker containers. Fargate is the serverless mode — you don't manage EC2 instances. You define how much CPU and memory a container needs, and AWS handles the rest.

Vega has three long-running ECS services (the API, worker, and LLM proxy) and two ECS task definitions used for one-off runs (the runner and maintenance):

Type Service/Task How it runs
Long-running service vega-api Always running, ECS restarts on failure
Long-running service vega-worker Always running, polls SQS in a loop
Long-running service vega-llm-proxy Always running, handles AI proxy requests
One-off task vega-v16-runner Launched per scan via ECS RunTask, exits when done
One-off task vega-maintenance Launched manually for migrations and cleanup

Scan flow in AWS (step by step)

  1. User creates a scan in the dashboard.
  2. vega-api writes a queued scan row to RDS Postgres.
  3. vega-api sends a message to the SQS queue containing the scan ID.
  4. vega-worker (which is always running) receives the SQS message.
  5. vega-worker claims the scan with a Postgres row lock (prevents double-execution).
  6. vega-worker calls AWS ECS RunTask to start a vega-v16-runner container.
  7. vega-v16-runner downloads the source snapshot from S3.
  8. vega-v16-runner runs the v16 scan engine — planning + per-component audits.
  9. All AI calls from v16/Codex go through vega-llm-proxy, which holds the provider API key.
  10. Findings and events are written to Postgres. Artifacts are uploaded to S3.
  11. Scan status is updated to completed.
  12. The vega-v16-runner ECS task exits. You're only billed for the time it ran.
  13. The dashboard reads the updated scan status and findings via the API.

Logs and observability

CloudWatch is AWS's logging service. Every ECS container is configured to send its stdout/stderr to a CloudWatch log group. When debugging an AWS issue, CloudWatch logs are almost always the first place to look.

Log groups follow this naming pattern: /vega/<env>/<service-name>

Relevant Terraform: infra/terraform/modules/observability/main.tf.