Skip to content

Production Readiness

This page tracks whether Vega is ready for a reliable production deployment. "Production ready" means the system can be safely deployed, operated, debugged, and evolved by the team.

Source material

Detailed planning lives in the main repository:

  • production_readiness_plan.md — full checklist and status
  • aws.md — comprehensive AWS runbook with account details and deployment notes
  • infra/terraform/README.md — Terraform module overview

What's in place

The following capabilities are implemented and working:

Capability Status
Service role separation (API, worker, runner, proxy, maintenance) ✓ Done
Terraform-managed infrastructure (dev and prod environments) ✓ Done
VPC with public/private subnets and security groups ✓ Done
ECS Fargate for all services ✓ Done
Postgres persistence with 5 migrations ✓ Done
S3 source and artifact storage ✓ Done
SQS scan dispatch ✓ Done
ECS RunTask for isolated scan execution ✓ Done
Cognito authentication with user groups ✓ Done
LLM proxy with per-scan usage limits ✓ Done
CloudFront frontend hosting ✓ Done
CloudWatch logging for all services ✓ Done
Cost controls (AWS Budgets, anomaly detection) ✓ Done
Scan cancellation (cooperative v16/ECS stop) ✓ Done
Stale scan recovery ✓ Done
S3 artifact lifecycle (runner summary, report, debug bundle) ✓ Done
Deployment scripts with idempotent sequences ✓ Done
Smoke tests ✓ Done

What still needs work

Gap Priority
Production DNS (api.vega.nebusec.ai) — not yet configured High
WAF in front of CloudFront/ALB High
CloudWatch alarms with PagerDuty/email routing High
CI/CD pipeline (currently all manual) Medium
Cost calibration — per-scan AI cost benchmarking Medium
Multi-AZ RDS for high availability Medium
End-to-end scan smoke test (automated) Medium
Full scan-to-findings integration test in AWS Low

Pre-launch checklist

Before going live in production, verify:

  • [ ] Prod Terraform plan is reviewed and expected (no surprises)
  • [ ] All secrets populated in Secrets Manager (DB creds, Cognito IDs, provider keys)
  • [ ] Database migrations applied (scripts/aws/run-migrations.sh prod)
  • [ ] All service images use reviewed, pinned tags (not latest)
  • [ ] Smoke tests pass: scripts/aws/smoke-test.sh prod
  • [ ] /v1/readyz returns 200 (all dependencies healthy)
  • [ ] A test scan completes end-to-end with findings visible in the dashboard
  • [ ] CloudWatch logs are flowing for all services
  • [ ] AWS Budget alerts are configured with the right recipients
  • [ ] Rollback plan is documented before any risky change