Enterprise Feature

Hybrid VPC Deployment

Deploy an AI Firewall inside your network. DLP scanning, PII redaction, and budget enforcement run on-prem — only metadata reaches the cloud dashboard.

Your compliance team requires that prompt content never leaves the corporate network
You operate in a regulated industry (healthcare, finance, legal, government)
You need data residency guarantees for AI workloads
You want full AI governance (DLP, budgets, policies) without sending data to a third-party cloud proxy
You need sub-50ms DLP latency with no external network hops

Architecture

The hybrid deployment splits into two zones: components that run inside your VPC (processing prompt content) and the cloud control plane (managing policies and displaying metadata-only analytics).

Go Hybrid Proxy

Your VPC

Compiled, statically-linked Go binary in a scratch Docker image. Handles all DLP scanning, PII redaction, prompt injection blocking, and budget enforcement locally. Forwards cleaned requests directly to your chosen AI provider.

30+ PII entity typesSub-50ms DLP latencyPer-project policy enforcementMonthly budget hard-stops

Presidio NER Sidecar

Your VPC

Stock Microsoft Presidio AnalyzerEngine providing named entity recognition. Open-source, auditable, no proprietary code. Used by the Go proxy as an NER primitive.

Open-source (MIT)Standard NLP modelsNo network egress

Sync Agent

Your VPC

Lightweight Go daemon that pulls policy bundles from the cloud control plane every 30 seconds and pushes metadata-only telemetry (token counts, latency, entity types). Never transmits prompt content.

Policy sync every 30sMetadata-only telemetryHeartbeat monitoring

Cloud Control Plane

AISG Cloud

Web dashboard for managing deployment configuration, DLP policies, budget limits, and viewing analytics. Receives only structural metadata — never prompt or response content.

Policy management UIReal-time analyticsViolation dashboardsMulti-project management

Request Flow

Your App

Sends request to the hybrid proxy (local network)

Go Proxy (DLP)

Scans for PII, applies policies, enforces budgets — all locally

AI Provider

Cleaned request forwarded directly to OpenAI, Anthropic, etc.

Sync Agent

Pushes metadata (tokens, cost, latency) to cloud — no content

Dashboard

View metrics, violations, and manage policies in the cloud UI

Deployment

The hybrid stack runs as three containers. Choose your deployment method — you'll receive the full deployment files and registry credentials when your Enterprise account is provisioned.

System Requirements

Container	Image Size	RAM Required	CPU
proxy	~15MB	256MB	1 core
presidio-ner	~2GB	3–4GB	2 cores
sync-agent	~10MB	32MB	0.25 core
Total	~2GB	4GB minimum	3 cores

Presidio's spaCy en_core_web_lg NLP model is the primary memory consumer (~750MB–1GB loaded into RAM).

Required Environment Variables

Both variables below are provided during Enterprise onboarding. The deployment will not start without them.

AISG_DEPLOYMENT_TOKEN	Machine-to-machine token (`dep_` prefix). Used by the sync agent to authenticate with the cloud control plane — required for policy sync, telemetry, and heartbeats.
AISG_REGISTRY_TOKEN	Read-only token for pulling container images from the private registry (`ghcr.io`). Used with `docker login` or Kubernetes imagePullSecret.

Option A — Docker Compose

docker-compose.hybrid.yml (simplified)

# .env.hybrid — fill in values from onboarding
# AISG_DEPLOYMENT_TOKEN=dep_your_token   ← REQUIRED
# AISG_REGISTRY_TOKEN=ghp_your_token     ← for docker login
# OPENAI_API_KEY=sk-your-key             ← at least one provider

services:
  proxy:
    image: ghcr.io/aisecuritygateway/hybrid-proxy:latest
    ports:
      - "8080:8000"
    environment:
      - AISG_API_KEY=${AISG_API_KEY}
      - PRESIDIO_URL=http://presidio:3000
      - SYNC_AGENT_EVENTS_URL=http://sync-agent:9090/events
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - LOG_LEVEL=info
    depends_on: [presidio, sync-agent]
    deploy:
      resources:
        limits:
          memory: 256M

  presidio:
    image: ghcr.io/aisecuritygateway/presidio-ner:latest
    deploy:
      resources:
        limits:
          memory: 4G
        reservations:
          memory: 3G

  sync-agent:
    image: ghcr.io/aisecuritygateway/sync-agent:latest
    environment:
      - AISG_DEPLOYMENT_TOKEN=${AISG_DEPLOYMENT_TOKEN}  # ← REQUIRED
      - AISG_CONTROL_PLANE_URL=https://api.aisecuritygateway.ai
    volumes:
      - policy-cache:/var/aisg/cache

Start the proxy

# Log in to the container registry (credentials provided during onboarding)
docker login ghcr.io

# Start all three containers
docker compose -f docker-compose.hybrid.yml up -d

# Verify — you should see "healthy" for all services
docker compose ps

Option B — Kubernetes

For teams running on Kubernetes, we provide ready-to-apply manifests with resource limits, health checks, network policies, and Kustomize support. K8s manifests are provided during onboarding.

Deploy to Kubernetes

# Create namespace and image pull secret
kubectl create namespace aisg
kubectl create secret docker-registry ghcr-pull-secret \
  --namespace aisg \
  --docker-server=ghcr.io \
  --docker-username=aisg-deploy \
  --docker-password="$AISG_REGISTRY_TOKEN"

# Apply all manifests (Kustomize)
kubectl apply -k hybrid-vpc/k8s/

# Verify — all three pods should be Running
kubectl -n aisg get pods

Included in the K8s bundle

Deployment + Service per container
Resource requests & limits
Liveness & readiness probes
PersistentVolumeClaim for policy cache

Security & networking

NetworkPolicy (default-deny + least privilege)
Presidio only reachable from proxy
Sync-agent only reachable from proxy
Kustomize for image tag management

Scaling note: The proxy can scale horizontally (multiple replicas sharing the policy cache). Presidio and sync-agent should stay at 1 replica each. Add your own Ingress or Gateway resource to expose the proxy Service.

System requirements: Minimum 3 vCPU, 4 GB RAM (Presidio's spaCy NLP model is the primary consumer). Outbound HTTPS to your AI providers and api.aisecuritygateway.ai. No inbound internet access required.

Integration

Point your application's base URL to the hybrid proxy. Works with the OpenAI SDK, Anthropic SDK, or any HTTP client.

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",  # Hybrid proxy
    api_key="oah_your_project_api_key",   # From the dashboard
)

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Summarize Q3 results"}],
)

cURL

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer oah_your_project_api_key" \
  -H "Content-Type: application/json" \
  -H "x-provider: openai" \
  -H "x-model: gpt-4.1" \
  -d '{"messages": [{"role": "user", "content": "Hello"}]}'

Your AI provider keys stay local. Configure your OpenAI / Anthropic / Google API keys in the proxy's environment file. They are never sent to the AISG cloud — only the proxy uses them to forward requests.

Security Guarantees

Prompt Data Sovereignty

All prompt and response content is processed within your network boundary. The cloud control plane never receives, stores, or processes prompt text.

Compiled Binary — No Source Exposure

The Go proxy ships as a statically-compiled binary in a scratch Docker image. No shell, no OS packages, no runtime interpreter — zero attack surface for code extraction.

Metadata-Only Telemetry

The sync agent transmits only structural metadata: token counts, latency measurements, entity type counts, and cost estimates. Content fields are rejected server-side.

Local Budget Enforcement

Per-request token limits and monthly budget caps are enforced locally by the proxy. No cloud round-trip required for budget checks — requests are blocked instantly when limits are hit.

Policy-as-Code

DLP policies, entity selections, sensitivity tiers, and budget limits are defined in the cloud dashboard and synced as signed JSON bundles. The proxy applies them deterministically.

Fail-Closed Architecture

If the proxy cannot verify a request is safe, it does not forward it. DLP failures, policy errors, and budget exhaustion all result in blocked requests — never silent pass-through.

What's Included

On-Prem Components

Compiled Go proxy with built-in DLP engine
30+ PII entity detection (SSN, credit card, API keys, etc.)
Custom regex patterns (IP Guard)
Prompt injection blocking
Per-request token limits
Monthly budget enforcement
Presidio NER sidecar (open-source)
Go sync agent for policy & telemetry

Cloud Dashboard

DLP policy management per project
Entity selection & sensitivity tiers
Monthly budget configuration
Real-time request & violation dashboards
Deployment health monitoring
Multi-project management
API key management (oah_ prefix)
Policy version history

Deployment Options Compared

	OSS Self-Host	SaaS Cloud	Hybrid VPC
Prompt data stays on-prem	Yes	No	Yes
Cloud dashboard	No	Yes	Yes
DLP entity types	13	30+	30+
Smart cost routing	No	Yes	No
Budget enforcement	No	Yes	Yes (local)
Policy management	Manual config	Dashboard	Dashboard → synced
Multi-project	No	Yes	Yes
Deployment model	Docker Compose	Managed cloud	Docker Compose / K8s + cloud

Getting Started

1
Sign up & create a project
Create an account, then create a project in the dashboard. Configure your DLP policy and budget limits.
2
Request Enterprise / Hybrid VPC access
Contact us at enterprise@aisecuritygateway.ai to enable Hybrid VPC on your account. We'll provision your deployment token and container registry access.
3
Deploy the proxy stack
Run three containers in your VPC using Docker Compose or Kubernetes manifests (both provided). Configure your AI provider keys and deployment token in the environment file or Kubernetes Secret.
4
Point your app to the proxy
Change your OpenAI SDK base URL to the proxy's address. Use your project API key. That's it — all requests are now governed.

Frequently Asked Questions

What data leaves my network?

Only structural metadata: token counts, request latency, DLP entity types detected (e.g. 'EMAIL_ADDRESS: 2'), and estimated cost. Prompt text, response text, and user content never leave your VPC. The sync agent's telemetry payload is validated server-side — any fields resembling content are rejected.

What providers does the hybrid proxy support?

The proxy forwards to any OpenAI-compatible API endpoint. This includes OpenAI, Anthropic, Google Gemini, Groq, Together.ai, xAI, Mistral, AWS Bedrock, and DeepInfra. You configure the provider endpoint and API key in your environment.

What happens if the cloud control plane is unreachable?

The proxy continues operating with the last-synced policy bundle. DLP scanning, PII redaction, and budget enforcement all function normally using cached policies. When connectivity is restored, the sync agent automatically resumes policy pulls and telemetry pushes.

Can one proxy serve multiple projects?

Yes. The hybrid proxy supports multi-project deployments. Each project gets its own API key (oah_ prefix) with independent DLP policies, entity selections, and budget limits. The proxy routes requests to the correct project based on the API key.

How is the proxy deployed?

Three containers: the Go proxy, the Presidio NER sidecar, and the Go sync agent. Deploy via Docker Compose or Kubernetes (we provide ready-to-apply manifests for both). Minimum system requirements are 3 vCPU and 4GB RAM (Presidio's spaCy NLP model is the primary consumer). The proxy listens on a local port — you point your application's base URL to it.

What's the DLP latency?

Typically under 50ms for text requests. The Go DLP engine runs all pattern matching, entity detection, and policy evaluation in-process. The Presidio sidecar is called only for NLP-based entity recognition (names, locations) and adds approximately 20-30ms.

Is the proxy source code available?

The proxy ships as a compiled Go binary. The open-source AISG proxy (Apache 2.0) is available on GitHub for the cloud/self-hosted version. The hybrid proxy includes proprietary DLP enhancements compiled into the binary — these are not available as source code.

How do I monitor the deployment?

The sync agent sends heartbeats to the cloud dashboard every 60 seconds. The dashboard shows deployment status (Connected/Disconnected), last heartbeat time, and component versions. Metrics (requests, tokens, cost, violations) update in real-time as telemetry is processed.

Ready to deploy?

Contact our team to enable Hybrid VPC on your account. We'll help you plan the deployment and get your first project live.

Contact Enterprise Sales View Pricing

Join the Community

GitHub LinkedIn X (Twitter)YouTube

Want to self-host this?

AI Security Gateway is open source. Deploy the core AI security proxy on your own infrastructure — PII redaction, prompt injection blocking, and secret detection included. No account required.

View on GitHub Learn more

Hybrid VPC Deployment

When to use Hybrid VPC

Architecture

Go Hybrid Proxy

Presidio NER Sidecar

Sync Agent

Cloud Control Plane

Request Flow

Deployment

Option A — Docker Compose

Option B — Kubernetes

Integration

Security Guarantees

Prompt Data Sovereignty

Compiled Binary — No Source Exposure

Metadata-Only Telemetry

Local Budget Enforcement

Policy-as-Code

Fail-Closed Architecture

What's Included

On-Prem Components

Cloud Dashboard

Deployment Options Compared

Getting Started

Frequently Asked Questions

Ready to deploy?