Production Readiness
This page tracks whether Vega is ready for a reliable production deployment. "Production ready" means the system can be safely deployed, operated, debugged, and evolved by the team.
Source material
Detailed planning lives in the main repository:
production_readiness_plan.md— full checklist and statusaws.md— comprehensive AWS runbook with account details and deployment notesinfra/terraform/README.md— Terraform module overview
What's in place
The following capabilities are implemented and working:
| Capability | Status |
|---|---|
| Service role separation (API, worker, runner, proxy, maintenance) | ✓ Done |
| Terraform-managed infrastructure (dev and prod environments) | ✓ Done |
| VPC with public/private subnets and security groups | ✓ Done |
| ECS Fargate for all services | ✓ Done |
| Postgres persistence with 5 migrations | ✓ Done |
| S3 source and artifact storage | ✓ Done |
| SQS scan dispatch | ✓ Done |
| ECS RunTask for isolated scan execution | ✓ Done |
| Cognito authentication with user groups | ✓ Done |
| LLM proxy with per-scan usage limits | ✓ Done |
| CloudFront frontend hosting | ✓ Done |
| CloudWatch logging for all services | ✓ Done |
| Cost controls (AWS Budgets, anomaly detection) | ✓ Done |
| Scan cancellation (cooperative v16/ECS stop) | ✓ Done |
| Stale scan recovery | ✓ Done |
| S3 artifact lifecycle (runner summary, report, debug bundle) | ✓ Done |
| Deployment scripts with idempotent sequences | ✓ Done |
| Smoke tests | ✓ Done |
What still needs work
| Gap | Priority |
|---|---|
Production DNS (api.vega.nebusec.ai) — not yet configured |
High |
| WAF in front of CloudFront/ALB | High |
| CloudWatch alarms with PagerDuty/email routing | High |
| CI/CD pipeline (currently all manual) | Medium |
| Cost calibration — per-scan AI cost benchmarking | Medium |
| Multi-AZ RDS for high availability | Medium |
| End-to-end scan smoke test (automated) | Medium |
| Full scan-to-findings integration test in AWS | Low |
Pre-launch checklist
Before going live in production, verify:
- [ ] Prod Terraform plan is reviewed and expected (no surprises)
- [ ] All secrets populated in Secrets Manager (DB creds, Cognito IDs, provider keys)
- [ ] Database migrations applied (
scripts/aws/run-migrations.sh prod) - [ ] All service images use reviewed, pinned tags (not
latest) - [ ] Smoke tests pass:
scripts/aws/smoke-test.sh prod - [ ]
/v1/readyzreturns 200 (all dependencies healthy) - [ ] A test scan completes end-to-end with findings visible in the dashboard
- [ ] CloudWatch logs are flowing for all services
- [ ] AWS Budget alerts are configured with the right recipients
- [ ] Rollback plan is documented before any risky change