All Models
4 Models · 3 Providers · PII Redacted

🦙Llama Models

by Meta (Open Source)

Meta's open-weights Llama family is the most widely deployed open-source LLM series. Compare Llama API pricing across Groq, Together, and DeepInfra to find the cheapest Llama provider. Llama 4 introduced mixture-of-experts (Maverick) and a long-context variant (Scout), while Llama 3.3 remains a cost-efficient workhorse for production workloads.

From $0.05/M tokens
3 providers
28+ PII entities redacted

Why deploy Llama through AI Security Gateway?

Automatic PII Redaction

Every Llama request is scanned for 28+ PII entity types — SSNs, credit cards, emails, API keys, and more — before it reaches any provider.

Smart Cost Routing

Llama is available across 3 providers. Our Smart Router picks the cheapest one per-request. 25% managed markup / 0% on Pro BYOK.

Zero Code Changes

Change two lines in your OpenAI SDK — base_url and api_key — and every request flows through AISG. Full backward compatibility.

Full Observability

Per-request logging of token counts, latency, DLP violations, and cost. Never wonder what your AI spend is again.

Llama Strengths

  • Open-weights — full transparency and audit capability
  • Multi-provider availability (Groq, Together, DeepInfra) for cost arbitrage
  • Llama 4 Maverick: 400B MoE with vision support
  • Llama 4 Scout: 512K context window for long-document tasks
  • Fine-tuning friendly — build domain-specific models on open weights

Available Llama Models (4)

Llama 4 Maverick

oah/llama-4-maverick
Open Source

Deploy Llama 4 Maverick with built-in PII redaction and Hub governance. 17B/400B MoE parameters. Available on Managed Credits and BYOK.

Together.aiDeepInfra
Input: $0.20/MOutput: $0.60/M

Llama 4 Scout

oah/llama-4-scout
Open Source

Deploy Llama 4 Scout with built-in PII redaction and Hub governance. 17B/109B MoE parameters. Available on Managed Credits and BYOK.

GroqTogether.aiDeepInfra
Input: $0.11/MOutput: $0.34/M

Llama 3.3 70B

oah/llama-3-70b
Open Source

Deploy Llama 3.3 70B with built-in PII redaction and Hub governance. 70B parameters. Available on Managed Credits and BYOK.

GroqTogether.aiDeepInfra
Input: $0.35/MOutput: $0.40/M

Llama 3.1 8B

oah/llama-3.1-8b
Open Source

Deploy Llama 3.1 8B with built-in PII redaction and Hub governance. 8B parameters. Available on Managed Credits and BYOK.

GroqTogether.aiDeepInfra
Input: $0.05/MOutput: $0.06/M

Llama Pricing Comparison (per 1M tokens, USD)

Input / Output pricing by provider. Managed Mode adds a 25% managed markup. Pro BYOK = 0% markup.

ModelParamsContextVisionTogether.aiDeepInfraGroq
Llama 4 Maverick
oah/llama-4-maverick
17B/400B MoENo
$0.27/$0.85
$0.20/$0.60
Llama 4 Scout
oah/llama-4-scout
17B/109B MoENo
$0.18/$0.59
$0.15/$0.45
$0.11/$0.34
Llama 3.3 70B
oah/llama-3-70b
70BNo
$0.88/$0.88
$0.35/$0.40
$0.59/$0.79
Llama 3.1 8B
oah/llama-3.1-8b
8BNo
$0.18/$0.18
$0.06/$0.06
$0.05/$0.08

Llama Direct vs AI Security Gateway

What you get at each pricing tier. Hub adds security, governance, and multi-provider routing on top of raw API access.

ModeWhat You PayPII RedactionBudget CapsRoutingAudit Trail
Direct to MetaProvider pricing onlyNoneNoneManualNone
Hub — Managed ModeProvider + 25% markup28+ PII typesPer-key hard capsSmart RouterFull compliance log
Hub — Pro BYOK ($29/mo)Direct to provider (0% markup)28+ PII typesPer-key hard capsSmart RouterFull compliance log

Popular Use Cases

1

Privacy-sensitive deployments requiring model auditability

2

Cost-optimized chatbots and customer support agents

3

Long-document summarization and analysis (Scout 512K context)

4

Multi-provider redundancy with automatic failover

Integration — 2 Lines

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aisecuritygateway.ai/v1",
    api_key="your_hub_api_key"
)

# Use any virtual model name from the pricing table above
response = client.chat.completions.create(
    model="oah/llama-4-maverick",
    messages=[{"role": "user", "content": "Hello!"}]
)

Use any virtual model name from the pricing table above (prefixed with oah/). Works with the standard OpenAI SDK. Every request is PII-scanned before reaching Meta (Open Source).

Frequently Asked Questions

What is the Llama API pricing on AI Security Gateway?
Llama API pricing varies by provider and model variant. In Managed Mode, we add a 25% markup on top of the provider's rate. With Pro BYOK ($29/mo), you pay the provider directly at 0% markup. Our Smart Router automatically picks the cheapest available provider for each request.
Who is the cheapest Llama provider?
Llama 3.3 70B on Groq is typically the cheapest Llama provider. Our Smart Router compares real-time pricing across Groq, Together.ai, and DeepInfra and automatically routes to the cheapest one for you.
Llama 3 vs Llama 4 — which should I use?
Llama 4 Maverick (400B MoE) and Llama 4 Scout (512K context) are the latest and most capable. Llama 3.3 70B remains the best value for cost-sensitive production workloads. All versions run through AISG with identical PII protection.
Can I use my own Llama API keys with AI Security Gateway?
Yes. With Pro BYOK mode, store your Groq, Together.ai, or DeepInfra keys in AISG (AES-256 encrypted). We route requests through your account at 0% markup — you only pay the provider directly.
Does AI Security Gateway store my Llama prompts?
No. Prompts are processed in volatile memory (RAM) and discarded immediately. We never persist, log, or train on your content. Only metadata (token counts, latency, violation types) is stored.
What happens when a Llama model is retired?
Retired models automatically move to the 'Previous Versions' section on this page. If a replacement exists, it's shown alongside. Your API calls will return a clear error indicating the model is deprecated.

Deploy Llama with Enterprise-Grade Security

Get started with 1,000,000 free credits. Every Llama request is PII-scanned, cost-optimized, and fully logged — zero configuration.

Not ready yet? Get notified about Llama updates:

Explore Other Model Families

Model registry last updated: 2026-04-18T17:41:46.389Z. Pricing shown is the lowest available rate across providers (per 1M tokens, USD). Actual pricing depends on provider and plan.