Skip to content

LLM Proxy

The LLM proxy is a small but important service that sits between scan runners and the AI provider (e.g., OpenAI). Its job is to keep the provider API key out of runner containers and to enforce per-scan usage limits.

Why the proxy exists

Without the proxy, every runner task would need the raw provider API key as an environment variable. If a runner task were ever compromised or if someone accessed the ECS task metadata, they'd have the key. The proxy solves this by:

  1. Credential isolation — only the proxy knows the provider API key. Runners get a short-lived, scan-scoped proxy token instead.
  2. Usage enforcement — the proxy tracks tokens and requests per scan and can hard-cap them.
  3. Cost attribution — usage is tracked per scan, making it possible to calculate per-scan AI costs.

How it works

Runner task
    │ OPENAI_BASE_URL=http://vega-llm-proxy:8001
    │ Authorization: Bearer <scan-scoped-token>
    ↓
vega-llm-proxy (FastAPI)
    1. Validate the scan-scoped token
    2. Increment usage counter for this scan
    3. Check if any per-scan limits are exceeded
    4. If OK: forward request to provider with the real API key
    5. Record response tokens and cost
    ↓
AI provider API

The proxy is designed to be an OpenAI-compatible endpoint. Codex (and anything else that calls OpenAI's API) works with the proxy without code changes — you just set OPENAI_BASE_URL to point at the proxy.

Scan-scoped tokens

Before launching a runner task, the worker calls the proxy to mint a short-lived token for that specific scan. This token: - Is bound to the scan ID - Expires after a fixed duration - Is rejected if the scan is cancelled or completed - Cannot be reused for a different scan

app/llm_proxy/tokens.py handles token generation and validation.

Per-scan limits

Set these environment variables on the LLM proxy service to cap AI spending:

Variable What it limits
VEGA_LLM_PROXY_MAX_REQUESTS_PER_SCAN Hard cap on number of AI API calls. Set to 0 to disable.
VEGA_LLM_PROXY_MAX_TOKENS_PER_SCAN Hard cap on total tokens (input + output). Set to 0 to disable.
VEGA_LLM_PROXY_MAX_COST_USD_PER_SCAN Estimated cost cap in USD. Uses VEGA_LLM_PROXY_PRICE_USD_PER_1K_TOKENS for estimation. Set to 0 to disable.

When a cap is hit, the proxy returns an error to the runner. v16 will fail the scan with a clear error event.

Configuration

Configure the proxy itself:

VEGA_LLM_PROVIDER_BASE_URL=https://api.openai.com/v1   # or any OpenAI-compatible endpoint
VEGA_LLM_PROVIDER_API_KEY=sk-...                        # the real provider API key
VEGA_LLM_PROXY_AUTH_SECRET=some-random-secret           # used to sign scan-scoped tokens

Configure runners to use the proxy:

VEGA_LLM_PROXY_BASE_URL=http://vega-llm-proxy:8001      # internal address of the proxy

In AWS, the proxy runs in a private subnet accessible only by runner tasks (enforced by security groups).

Running locally

The proxy is optional for local development. If VEGA_LLM_PROXY_BASE_URL is not set, Codex will call the provider directly using OPENAI_API_KEY. This is convenient locally but means the raw API key is in the process environment.

To run the proxy locally:

VEGA_LLM_PROVIDER_BASE_URL=https://api.openai.com/v1 \
VEGA_LLM_PROVIDER_API_KEY=sk-... \
VEGA_LLM_PROXY_AUTH_SECRET=dev-secret \
uvicorn app.llm_proxy.main:app --port 8001

Debugging

Scans fail with provider authentication errors: 1. Is the runner's VEGA_LLM_PROXY_BASE_URL pointing at the proxy (not the provider directly)? 2. Is the proxy running? Check GET http://vega-llm-proxy:8001/healthz from the runner network. 3. Is the provider API key correct? Check proxy CloudWatch logs for 401 from the provider.

Scans fail with "usage limit exceeded": 1. The per-scan caps have been hit. This is working as intended. 2. If you need higher limits, increase VEGA_LLM_PROXY_MAX_TOKENS_PER_SCAN or similar.

Proxy logs show requests but no response from provider: 1. Check network connectivity from the proxy to the provider. 2. In AWS, confirm the proxy task security group allows outbound HTTPS.