Hybrid VPC Deployment
Deploy an AI Firewall inside your network. DLP scanning, PII redaction, and budget enforcement run on-prem — only metadata reaches the cloud dashboard.
When to use Hybrid VPC
- Your compliance team requires that prompt content never leaves the corporate network
- You operate in a regulated industry (healthcare, finance, legal, government)
- You need data residency guarantees for AI workloads
- You want full AI governance (DLP, budgets, policies) without sending data to a third-party cloud proxy
- You need sub-50ms DLP latency with no external network hops
Architecture
The hybrid deployment splits into two zones: components that run inside your VPC (processing prompt content) and the cloud control plane (managing policies and displaying metadata-only analytics).
Go Hybrid Proxy
Your VPCCompiled, statically-linked Go binary in a scratch Docker image. Handles all DLP scanning, PII redaction, prompt injection blocking, and budget enforcement locally. Forwards cleaned requests directly to your chosen AI provider.
Presidio NER Sidecar
Your VPCStock Microsoft Presidio AnalyzerEngine providing named entity recognition. Open-source, auditable, no proprietary code. Used by the Go proxy as an NER primitive.
Sync Agent
Your VPCLightweight Go daemon that pulls policy bundles from the cloud control plane every 30 seconds and pushes metadata-only telemetry (token counts, latency, entity types). Never transmits prompt content.
Cloud Control Plane
AISG CloudWeb dashboard for managing deployment configuration, DLP policies, budget limits, and viewing analytics. Receives only structural metadata — never prompt or response content.
Request Flow
Your App
Sends request to the hybrid proxy (local network)
Go Proxy (DLP)
Scans for PII, applies policies, enforces budgets — all locally
AI Provider
Cleaned request forwarded directly to OpenAI, Anthropic, etc.
Sync Agent
Pushes metadata (tokens, cost, latency) to cloud — no content
Dashboard
View metrics, violations, and manage policies in the cloud UI
Deployment
The hybrid stack runs as three containers. Choose your deployment method — you'll receive the full deployment files and registry credentials when your Enterprise account is provisioned.
System Requirements
| Container | Image Size | RAM Required | CPU |
|---|---|---|---|
| proxy | ~15MB | 256MB | 1 core |
| presidio-ner | ~2GB | 3–4GB | 2 cores |
| sync-agent | ~10MB | 32MB | 0.25 core |
| Total | ~2GB | 4GB minimum | 3 cores |
Presidio's spaCy en_core_web_lg NLP model is the primary memory consumer (~750MB–1GB loaded into RAM).
Option A — Docker Compose
services:
proxy:
image: ghcr.io/aisecuritygateway/hybrid-proxy:latest
ports:
- "8080:8000"
environment:
- AISG_API_KEY=${AISG_API_KEY}
- PRESIDIO_URL=http://presidio:3000
- SYNC_AGENT_EVENTS_URL=http://sync-agent:9090/events
- LOG_LEVEL=info
depends_on: [presidio, sync-agent]
deploy:
resources:
limits:
memory: 256M
presidio:
image: ghcr.io/aisecuritygateway/presidio-ner:latest
deploy:
resources:
limits:
memory: 4G
reservations:
memory: 3G
sync-agent:
image: ghcr.io/aisecuritygateway/sync-agent:latest
environment:
- AISG_DEPLOYMENT_TOKEN=${AISG_DEPLOYMENT_TOKEN}
- AISG_CONTROL_PLANE_URL=https://api.aisecuritygateway.ai
volumes:
- policy-cache:/var/aisg/cache# Log in to the container registry (credentials provided during onboarding)
docker login ghcr.io
# Start all three containers
docker compose -f docker-compose.hybrid.yml up -d
# Verify — you should see "healthy" for all services
docker compose psOption B — Kubernetes
For teams running on Kubernetes, we provide ready-to-apply manifests with resource limits, health checks, network policies, and Kustomize support. K8s manifests are provided during onboarding.
# Create namespace and image pull secret
kubectl create namespace aisg
kubectl create secret docker-registry ghcr-pull-secret \
--namespace aisg \
--docker-server=ghcr.io \
--docker-username=aisg-deploy \
--docker-password="$AISG_REGISTRY_TOKEN"
# Apply all manifests (Kustomize)
kubectl apply -k hybrid-vpc/k8s/
# Verify — all three pods should be Running
kubectl -n aisg get podsIncluded in the K8s bundle
- Deployment + Service per container
- Resource requests & limits
- Liveness & readiness probes
- PersistentVolumeClaim for policy cache
Security & networking
- NetworkPolicy (default-deny + least privilege)
- Presidio only reachable from proxy
- Sync-agent only reachable from proxy
- Kustomize for image tag management
Scaling note: The proxy can scale horizontally (multiple replicas sharing the policy cache). Presidio and sync-agent should stay at 1 replica each. Add your own Ingress or Gateway resource to expose the proxy Service.
System requirements: Minimum 3 vCPU, 4 GB RAM (Presidio's spaCy NLP model is the primary consumer). Outbound HTTPS to your AI providers and api.aisecuritygateway.ai. No inbound internet access required.
Integration
Point your application's base URL to the hybrid proxy. Works with the OpenAI SDK, Anthropic SDK, or any HTTP client.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1", # Hybrid proxy
api_key="oah_your_project_api_key", # From the dashboard
)
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Summarize Q3 results"}],
)curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer oah_your_project_api_key" \
-H "Content-Type: application/json" \
-H "x-provider: openai" \
-H "x-model: gpt-4.1" \
-d '{"messages": [{"role": "user", "content": "Hello"}]}'Your AI provider keys stay local. Configure your OpenAI / Anthropic / Google API keys in the proxy's environment file. They are never sent to the AISG cloud — only the proxy uses them to forward requests.
Security Guarantees
Prompt Data Sovereignty
All prompt and response content is processed within your network boundary. The cloud control plane never receives, stores, or processes prompt text.
Compiled Binary — No Source Exposure
The Go proxy ships as a statically-compiled binary in a scratch Docker image. No shell, no OS packages, no runtime interpreter — zero attack surface for code extraction.
Metadata-Only Telemetry
The sync agent transmits only structural metadata: token counts, latency measurements, entity type counts, and cost estimates. Content fields are rejected server-side.
Local Budget Enforcement
Per-request token limits and monthly budget caps are enforced locally by the proxy. No cloud round-trip required for budget checks — requests are blocked instantly when limits are hit.
Policy-as-Code
DLP policies, entity selections, sensitivity tiers, and budget limits are defined in the cloud dashboard and synced as signed JSON bundles. The proxy applies them deterministically.
Fail-Closed Architecture
If the proxy cannot verify a request is safe, it does not forward it. DLP failures, policy errors, and budget exhaustion all result in blocked requests — never silent pass-through.
What's Included
On-Prem Components
- Compiled Go proxy with built-in DLP engine
- 30+ PII entity detection (SSN, credit card, API keys, etc.)
- Custom regex patterns (IP Guard)
- Prompt injection blocking
- Per-request token limits
- Monthly budget enforcement
- Presidio NER sidecar (open-source)
- Go sync agent for policy & telemetry
Cloud Dashboard
- DLP policy management per project
- Entity selection & sensitivity tiers
- Monthly budget configuration
- Real-time request & violation dashboards
- Deployment health monitoring
- Multi-project management
- API key management (oah_ prefix)
- Policy version history
Deployment Options Compared
| OSS Self-Host | SaaS Cloud | Hybrid VPC | |
|---|---|---|---|
| Prompt data stays on-prem | Yes | No | Yes |
| Cloud dashboard | No | Yes | Yes |
| DLP entity types | 13 | 30+ | 30+ |
| Smart cost routing | No | Yes | No |
| Budget enforcement | No | Yes | Yes (local) |
| Policy management | Manual config | Dashboard | Dashboard → synced |
| Multi-project | No | Yes | Yes |
| Deployment model | Docker Compose | Managed cloud | Docker Compose / K8s + cloud |
Getting Started
- 1
Sign up & create a project
Create an account, then create a project in the dashboard. Configure your DLP policy and budget limits.
- 2
Request Enterprise / Hybrid VPC access
Contact us at enterprise@aisecuritygateway.ai to enable Hybrid VPC on your account. We'll provision your deployment token and container registry access.
- 3
Deploy the proxy stack
Run three containers in your VPC using Docker Compose or Kubernetes manifests (both provided). Configure your AI provider keys and deployment token in the environment file or Kubernetes Secret.
- 4
Point your app to the proxy
Change your OpenAI SDK base URL to the proxy's address. Use your project API key. That's it — all requests are now governed.
Frequently Asked Questions
What data leaves my network?
Only structural metadata: token counts, request latency, DLP entity types detected (e.g. 'EMAIL_ADDRESS: 2'), and estimated cost. Prompt text, response text, and user content never leave your VPC. The sync agent's telemetry payload is validated server-side — any fields resembling content are rejected.
What providers does the hybrid proxy support?
The proxy forwards to any OpenAI-compatible API endpoint. This includes OpenAI, Anthropic, Google Gemini, Groq, Together.ai, xAI, Mistral, AWS Bedrock, and DeepInfra. You configure the provider endpoint and API key in your environment.
What happens if the cloud control plane is unreachable?
The proxy continues operating with the last-synced policy bundle. DLP scanning, PII redaction, and budget enforcement all function normally using cached policies. When connectivity is restored, the sync agent automatically resumes policy pulls and telemetry pushes.
Can one proxy serve multiple projects?
Yes. The hybrid proxy supports multi-project deployments. Each project gets its own API key (oah_ prefix) with independent DLP policies, entity selections, and budget limits. The proxy routes requests to the correct project based on the API key.
How is the proxy deployed?
Three containers: the Go proxy, the Presidio NER sidecar, and the Go sync agent. Deploy via Docker Compose or Kubernetes (we provide ready-to-apply manifests for both). Minimum system requirements are 3 vCPU and 4GB RAM (Presidio's spaCy NLP model is the primary consumer). The proxy listens on a local port — you point your application's base URL to it.
What's the DLP latency?
Typically under 50ms for text requests. The Go DLP engine runs all pattern matching, entity detection, and policy evaluation in-process. The Presidio sidecar is called only for NLP-based entity recognition (names, locations) and adds approximately 20-30ms.
Is the proxy source code available?
The proxy ships as a compiled Go binary. The open-source AISG proxy (Apache 2.0) is available on GitHub for the cloud/self-hosted version. The hybrid proxy includes proprietary DLP enhancements compiled into the binary — these are not available as source code.
How do I monitor the deployment?
The sync agent sends heartbeats to the cloud dashboard every 60 seconds. The dashboard shows deployment status (Connected/Disconnected), last heartbeat time, and component versions. Metrics (requests, tokens, cost, violations) update in real-time as telemetry is processed.
Ready to deploy?
Contact our team to enable Hybrid VPC on your account. We'll help you plan the deployment and get your first project live.
Join the Community
Want to self-host this?
AI Security Gateway is open source. Deploy the core AI security proxy on your own infrastructure — PII redaction, prompt injection blocking, and secret detection included. No account required.