Source Ingest
Source ingest is the process of getting a repository's code into a form Vega can scan. The backend supports two intake methods: fetching from a Git URL and extracting a user-uploaded archive.
Intake methods
Git fetch
When a user provides a Git URL, app/projects/fetcher.py clones the repository into a local directory (or a temporary path before snapshot upload). The fetcher handles:
- Shallow clones to avoid downloading unnecessary history
- Authentication for private repositories (if credentials are configured)
- Walking the source tree to build a file listing
Zip/archive upload
Users can also upload a zip or tar archive. The upload flow:
app/uploads/service.pyhandles the multipart form upload.app/storage/archive.pyextracts the archive safely, enforcing the limits below.- The extracted source directory becomes the snapshot.
Git upload (programmatic)
app/git_upload/service.py handles a special case: creating a temporary git remote that clients can git push to. This is used for programmatic or CI-driven workflows where you want to push code rather than specify a URL.
Archive safety
Archive extraction is a security boundary. A malicious user could craft an archive that:
- Extracts files outside the intended directory (path traversal, e.g., ../../etc/passwd)
- Contains millions of tiny files that exhaust disk space (zip bomb)
- Has a huge uncompressed-to-compressed ratio (zip bomb variant)
app/storage/archive.py enforces these limits:
| Setting | Default | What it prevents |
|---|---|---|
VEGA_MAX_SOURCE_BYTES |
2 GB | Archives larger than this are rejected before extraction |
VEGA_MAX_ARCHIVE_ENTRIES |
50,000 | Archives with too many files are rejected |
VEGA_MAX_ARCHIVE_FILE_BYTES |
(per file) | Individual files larger than this are rejected |
VEGA_MAX_ARCHIVE_UNCOMPRESSED_BYTES |
5 GB | Total extracted size cap |
Path traversal is detected and rejected before any file is written.
Snapshot storage
After ingest, the source is stored as an immutable snapshot:
Snapshots are stored in directories under data/snapshots/. The snapshot path is recorded in the repository record. Scans access source directly from this path.
Snapshots are uploaded as zip archives to the S3 source bucket. The S3 object key is stored in the repository record. Runner tasks download the snapshot from S3 before scanning.
The key backend settings:
# Local storage (default)
VEGA_FILE_STORAGE_BACKEND=local
# S3 storage (production)
VEGA_FILE_STORAGE_BACKEND=s3
VEGA_S3_SOURCE_BUCKET=vega-prod-source-abc123
Debugging ingest failures
Git clone failing: 1. Check that the URL is correct and accessible from the machine running the API. 2. For private repos, check whether git credentials are configured. 3. Check API logs for git error output.
Archive upload failing:
1. Check whether the upload exceeded any of the size limits above.
2. Look for path traversal errors in the API logs — if the archive contains ../ paths, it will be rejected.
3. Check that the data/uploads/ directory is writable (local) or the S3 bucket is accessible (production).
Snapshot upload to S3 failing:
1. Confirm VEGA_FILE_STORAGE_BACKEND=s3.
2. Confirm VEGA_S3_SOURCE_BUCKET is set to the correct bucket name.
3. Confirm the API task role has s3:PutObject on the bucket.
4. Check API logs for S3 client errors.