Llama API Pricing 2026 — From $0.05/1M tokens

Why deploy Llama through AI Security Gateway?

Automatic PII Redaction

Every Llama request is scanned for 30+ PII entity types — SSNs, credit cards, emails, API keys, and more — before it reaches any provider.

Smart Cost Routing

Llama is available across 3 providers. Our Smart Router picks the cheapest one per-request. 25% managed markup / 0% on Pro BYOK.

Native SDK or OpenAI Compatible

Use the AISG SDK (pip install aisg) for typed metadata and error handling, or change two lines in your OpenAI SDK. Both work.

Full Observability

Per-request logging of token counts, latency, DLP violations, and cost. Never wonder what your AI spend is again.

Llama Strengths

Open-weights — full transparency and audit capability
Multi-provider availability (Groq, Together, DeepInfra) for cost arbitrage
Llama 4 Maverick: 400B MoE with vision support
Llama 4 Scout: 512K context window for long-document tasks
Fine-tuning friendly — build domain-specific models on open weights

Available Llama Models (27)

Cogito V1 Preview Llama 70B

oah/llama-4-maverick

Open Source

Deploy Cogito V1 Preview Llama 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra

Input: $0.15/MOutput: $0.60/M

Cogito V1 Preview Llama 70B Turbo

oah/cogito-v1-preview-llama

Open Source

Deploy Cogito V1 Preview Llama 70B Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

DeepSeek R1 Distill Llama 70B

oah/deepseek-r1-distill-llama

Open Source

Deploy DeepSeek R1 Distill Llama 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

ReasoningInput: $2.00/MOutput: $2.00/M

meta-llama/Llama-2-7b-chat-hf

oah/llama-2-7b-chat-hf

Open Source

Deploy meta-llama/Llama-2-7b-chat-hf with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

Meta Llama 3 8B Instruct Reference

oah/llama-3-8b-chat-hf

Open Source

Deploy Meta Llama 3 8B Instruct Reference with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $0.20/MOutput: $0.20/M

Llama 3.1 405B

oah/llama-3.1

Open Source

Deploy Llama 3.1 405B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $3.50/MOutput: $3.50/M

Llama 3.2 1B

oah/llama-3.2

Open Source

Deploy Llama 3.2 1B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $0.06/MOutput: $0.06/M

Meta Llama 3.3 70B Instruct

oah/llama-3.3

Open Source

Deploy Meta Llama 3.3 70B Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra

Input: $0.13/MOutput: $0.39/M

Llama 3.3 70B Instruct FP8 Lora

oah/llama-3.3-70b-instruct-fp8-lora

Open Source

Deploy Llama 3.3 70B Instruct FP8 Lora with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

Llama 4 Maverick 17B 128E Instruct Nvfp4

oah/llama-4-maverick-17b-128e-instruct-fp4

Open Source

Deploy Llama 4 Maverick 17B 128E Instruct Nvfp4 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

Llama 4 Scout (17Bx16E)

oah/llama-4-scout

Open Source

Deploy Llama 4 Scout (17Bx16E) with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiGroqDeepInfra

Input: $0.11/MOutput: $0.34/M

Llama 4 Scout 17B 16E Instruct Fp8 Lora

oah/llama-4-scout-17b-16e-instruct-fp8-lora

Open Source

Deploy Llama 4 Scout 17B 16E Instruct Fp8 Lora with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

Meta Llama 3 70B Instruct Turbo

oah/meta-llama-3

Open Source

Deploy Meta Llama 3 70B Instruct Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $0.20/MOutput: $0.20/M

Llama 3.1 70B

oah/meta-llama-3.1

Open Source

Deploy Llama 3.1 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra

Input: $0.06/MOutput: $0.06/M

nim/meta/llama-3.1-70b-instruct

oah/meta/llama-3.1

Open Source

Deploy nim/meta/llama-3.1-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

nim/meta/llama-3.2-11b-vision-instruct

oah/meta/llama-3.2-11b-vision

Open Source

Deploy nim/meta/llama-3.2-11b-vision-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

nim/meta/llama-3.2-90b-vision-instruct

oah/meta/llama-3.2-90b-vision

Open Source

Deploy nim/meta/llama-3.2-90b-vision-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

nim/meta/llama-3.3-70b-instruct

oah/meta/llama-3.3

Open Source

Deploy nim/meta/llama-3.3-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

nim/nvidia/llama-3.1-nemotron-70b-instruct

oah/nvidia/llama-3.1-nemotron

Open Source

Deploy nim/nvidia/llama-3.1-nemotron-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

nim/nvidia/llama-3.3-nemotron-super-49b-v1

oah/nvidia/llama-3.3-nemotron-super-49b

Open Source

Deploy nim/nvidia/llama-3.3-nemotron-super-49b-v1 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

Llama 3.1 Nemotron 70B Instruct HF

oah/llama-3.1-nemotron-70b-instruct-hf

Open Source

Deploy Llama 3.1 Nemotron 70B Instruct HF with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $0.88/MOutput: $0.88/M

Meta Llama 3.1 8B Instruct Awq Int4

oah/meta-llama-3.1-8b-instruct-awq-int4

Open Source

Deploy Meta Llama 3.1 8B Instruct Awq Int4 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

llama-3.1-8b-instant

oah/llama-3.1-8b-instant

Open Source

Deploy llama-3.1-8b-instant with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Groq

Input: $0.05/MOutput: $0.08/M

llama-3.3-70b-versatile

oah/llama-3.3-70b-versatile

Open Source

Deploy llama-3.3-70b-versatile with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Groq

Input: $0.59/MOutput: $0.79/M

NousResearch/Hermes-3-Llama-3.1-405B

oah/llama-3.3-nemotron-super-49b

Open Source

Deploy NousResearch/Hermes-3-Llama-3.1-405B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $1.00/MOutput: $1.00/M

NousResearch/Hermes-3-Llama-3.1-70B

oah/hermes-3-llama-3.1

Open Source

Deploy NousResearch/Hermes-3-Llama-3.1-70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $0.30/MOutput: $0.30/M

meta-llama/Llama-Guard-4-12B

oah/llama-guard-4

Open Source

Deploy meta-llama/Llama-Guard-4-12B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $0.18/MOutput: $0.18/M

Llama Pricing Comparison (per 1M tokens, USD)

Input / Output pricing by provider. Managed Mode adds a 25% managed markup. Pro BYOK = 0% markup.

Model	Params	Context	Vision	DeepInfra	Together.ai	Groq
Cogito V1 Preview Llama 70B `oah/llama-4-maverick`	—	131K	No	$0.15/$0.60	—	—
Cogito V1 Preview Llama 70B Turbo `oah/cogito-v1-preview-llama`	—	131K	No	—	—	—
DeepSeek R1 Distill Llama 70B `oah/deepseek-r1-distill-llama`	—	131K	No	—	$2.00/$2.00	—
meta-llama/Llama-2-7b-chat-hf `oah/llama-2-7b-chat-hf`	—	4K	No	—	—	—
Meta Llama 3 8B Instruct Reference `oah/llama-3-8b-chat-hf`	—	8K	No	—	$0.20/$0.20	—
Llama 3.1 405B `oah/llama-3.1`	—	131K	No	—	$3.50/$3.50	—
Llama 3.2 1B `oah/llama-3.2`	—	131K	No	—	$0.06/$0.06	—
Meta Llama 3.3 70B Instruct `oah/llama-3.3`	—	131K	No	$0.13/$0.39	$0.88/$0.88	—
Llama 3.3 70B Instruct FP8 Lora `oah/llama-3.3-70b-instruct-fp8-lora`	—	131K	No	—	—	—
Llama 4 Maverick 17B 128E Instruct Nvfp4 `oah/llama-4-maverick-17b-128e-instruct-fp4`	—	1.0M	No	—	—	—
Llama 4 Scout (17Bx16E) `oah/llama-4-scout`	—	262K	No	$0.15/$0.45	$0.18/$0.59	$0.11/$0.34
Llama 4 Scout 17B 16E Instruct Fp8 Lora `oah/llama-4-scout-17b-16e-instruct-fp8-lora`	—	10.5M	No	—	—	—
Meta Llama 3 70B Instruct Turbo `oah/meta-llama-3`	—	8K	No	—	$0.20/$0.20	—
Llama 3.1 70B `oah/meta-llama-3.1`	—	131K	No	$0.06/$0.06	$0.18/$0.18	—
nim/meta/llama-3.1-70b-instruct `oah/meta/llama-3.1`	—	16K	No	—	—	—
nim/meta/llama-3.2-11b-vision-instruct `oah/meta/llama-3.2-11b-vision`	—	16K	No	—	—	—
nim/meta/llama-3.2-90b-vision-instruct `oah/meta/llama-3.2-90b-vision`	—	16K	No	—	—	—
nim/meta/llama-3.3-70b-instruct `oah/meta/llama-3.3`	—	16K	No	—	—	—
nim/nvidia/llama-3.1-nemotron-70b-instruct `oah/nvidia/llama-3.1-nemotron`	—	16K	No	—	—	—
nim/nvidia/llama-3.3-nemotron-super-49b-v1 `oah/nvidia/llama-3.3-nemotron-super-49b`	—	16K	No	—	—	—
Llama 3.1 Nemotron 70B Instruct HF `oah/llama-3.1-nemotron-70b-instruct-hf`	—	33K	No	—	$0.88/$0.88	—
Meta Llama 3.1 8B Instruct Awq Int4 `oah/meta-llama-3.1-8b-instruct-awq-int4`	—	131K	No	—	—	—
llama-3.1-8b-instant `oah/llama-3.1-8b-instant`	—	—	No	—	—	$0.05/$0.08
llama-3.3-70b-versatile `oah/llama-3.3-70b-versatile`	—	—	No	—	—	$0.59/$0.79
NousResearch/Hermes-3-Llama-3.1-405B `oah/llama-3.3-nemotron-super-49b`	—	—	No	$1.00/$1.00	—	—
NousResearch/Hermes-3-Llama-3.1-70B `oah/hermes-3-llama-3.1`	—	—	No	$0.30/$0.30	—	—
meta-llama/Llama-Guard-4-12B `oah/llama-guard-4`	—	—	No	$0.18/$0.18	—	—

Llama Direct vs AI Security Gateway

What you get at each pricing tier. Hub adds security, governance, and multi-provider routing on top of raw API access.

Mode	What You Pay	PII Redaction	Budget Caps	Routing	Audit Trail
Direct to Meta	Provider pricing only	None	None	Manual	None
Hub — Managed Mode	Provider + 25% markup	30+ PII types	Per-key hard caps	Smart Router	Full compliance log
Hub — Pro BYOK ($29/mo)	Direct to provider (0% markup)	30+ PII types	Per-key hard caps	Smart Router	Full compliance log

Popular Use Cases

Privacy-sensitive deployments requiring model auditability

Cost-optimized chatbots and customer support agents

Long-document summarization and analysis (Scout 512K context)

Multi-provider redundancy with automatic failover

Quick Integration

# pip install aisg
from aisg import AISG

client = AISG(api_key="your_hub_api_key")

response = client.chat.create(
    model="oah/llama-4-maverick",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.content)
print(response.aisg_metadata.pii_detected)
print(response.aisg_metadata.cost_usd)

Use any virtual model name from the pricing table above (prefixed with oah/). Also works with the standard OpenAI SDK — just change base_url. Every request is PII-scanned before reaching Meta (Open Source).

Frequently Asked Questions

What is the Llama API pricing on AI Security Gateway?

Llama API pricing varies by provider and model variant. In Managed Mode, we add a 25% markup on top of the provider's rate. With Pro BYOK ($29/mo), you pay the provider directly at 0% markup. Our Smart Router automatically picks the cheapest available provider for each request.

Who is the cheapest Llama provider?

Llama 3.3 70B on Groq is typically the cheapest Llama provider. Our Smart Router compares real-time pricing across Groq, Together.ai, and DeepInfra and automatically routes to the cheapest one for you.

Llama 3 vs Llama 4 — which should I use?

Llama 4 Maverick (400B MoE) and Llama 4 Scout (512K context) are the latest and most capable. Llama 3.3 70B remains the best value for cost-sensitive production workloads. All versions run through AISG with identical PII protection.

Can I use my own Llama API keys with AI Security Gateway?

Yes. With Pro BYOK mode, store your Groq, Together.ai, or DeepInfra keys in AISG (AES-256 encrypted). We route requests through your account at 0% markup — you only pay the provider directly.

Does AI Security Gateway store my Llama prompts?

No. Prompts are processed in volatile memory (RAM) and discarded immediately. We never persist, log, or train on your content. Only metadata (token counts, latency, violation types) is stored.

What happens when a Llama model is retired?

Retired models automatically move to the 'Previous Versions' section on this page. If a replacement exists, it's shown alongside. Your API calls will return a clear error indicating the model is deprecated.

Previous Versions

These models have been retired by the provider. Migrate to a current variant above.

Meta Llama 3.3 70B Instruct Turbotogether

Llama 4 Maverick 17B 128Etogether

Llama 4 Maverick Instruct (17Bx128E) FP8together

Meta Llama 3 8B Instruct Litetogether

deepseek-ai/DeepSeek-R1-Distill-Llama-70Bdeepinfra

meta-llama/Llama-3.2-11B-Vision-Instructdeepinfra

meta-llama/Meta-Llama-3-8B-Instructdeepinfra

nvidia/Llama-3.1-Nemotron-70B-Instructdeepinfra

nvidia/Llama-3.3-Nemotron-Super-49B-v1.5deepinfra

Deploy Llama with Enterprise-Grade Security

Get started with 1,000,000 free credits. Every Llama request is PII-scanned, cost-optimized, and fully logged — zero configuration.

Get 1,000,000 Free Credits Free PII Leak Checker

Not ready yet? Get notified about Llama updates:

Explore Other Model Families

🧠GPT

OpenAI's GPT family powers the majority of commercial AI applications. Compare GPT-4 API cost and OpenAI API pricing acr…

💎Gemini

Google's Gemini family offers powerful multimodal capabilities with large context windows. Compare Gemini API pricing an…

🤖Claude

Anthropic's Claude family is built with safety and reliability at its core. Compare Claude API pricing and Claude Sonnet…

🔍DeepSeek

DeepSeek has rapidly risen as a leading open-source model family, known for exceptional coding performance and cost effi…

🌀Mistral

Mistral AI's model family spans from compact open-weights models to powerful commercial variants. Compare Mistral API pr…

← View all 11 model families

Model registry last updated: . Pricing shown is the lowest available rate across providers (per 1M tokens, USD). Actual pricing depends on provider and plan.

🦙Llama Models

Why deploy Llama through AI Security Gateway?

Automatic PII Redaction

Smart Cost Routing

Native SDK or OpenAI Compatible

Full Observability

Llama Strengths

Available Llama Models (27)

Cogito V1 Preview Llama 70B

Cogito V1 Preview Llama 70B Turbo

DeepSeek R1 Distill Llama 70B

meta-llama/Llama-2-7b-chat-hf

Meta Llama 3 8B Instruct Reference

Llama 3.1 405B

Llama 3.2 1B

Meta Llama 3.3 70B Instruct

Llama 3.3 70B Instruct FP8 Lora

Llama 4 Maverick 17B 128E Instruct Nvfp4

Llama 4 Scout (17Bx16E)

Llama 4 Scout 17B 16E Instruct Fp8 Lora

Meta Llama 3 70B Instruct Turbo

Llama 3.1 70B

nim/meta/llama-3.1-70b-instruct

nim/meta/llama-3.2-11b-vision-instruct

nim/meta/llama-3.2-90b-vision-instruct

nim/meta/llama-3.3-70b-instruct

nim/nvidia/llama-3.1-nemotron-70b-instruct

nim/nvidia/llama-3.3-nemotron-super-49b-v1

Llama 3.1 Nemotron 70B Instruct HF

Meta Llama 3.1 8B Instruct Awq Int4

llama-3.1-8b-instant

llama-3.3-70b-versatile

NousResearch/Hermes-3-Llama-3.1-405B

NousResearch/Hermes-3-Llama-3.1-70B

meta-llama/Llama-Guard-4-12B

Llama Pricing Comparison (per 1M tokens, USD)

Llama Direct vs AI Security Gateway

Popular Use Cases

Quick Integration

Frequently Asked Questions

Previous Versions

Deploy Llama with Enterprise-Grade Security

Explore Other Model Families