All Models
28 Models · 2 Providers · PII Redacted

🦙Llama Models

by Meta (Open Source)

Meta's open-weights Llama family is the most widely deployed open-source LLM series. Compare Llama API pricing across Groq, Together, and DeepInfra to find the cheapest Llama provider. Llama 4 introduced mixture-of-experts (Maverick) and a long-context variant (Scout), while Llama 3.3 remains a cost-efficient workhorse for production workloads.

From $0.05/M tokens
2 providers
28+ PII entities redacted

Why deploy Llama through AI Security Gateway?

Automatic PII Redaction

Every Llama request is scanned for 28+ PII entity types — SSNs, credit cards, emails, API keys, and more — before it reaches any provider.

Smart Cost Routing

Llama is available across 2 providers. Our Smart Router picks the cheapest one per-request. 25% managed markup / 0% on Pro BYOK.

Native SDK or OpenAI Compatible

Use the AISG SDK (pip install aisg) for typed metadata and error handling, or change two lines in your OpenAI SDK. Both work.

Full Observability

Per-request logging of token counts, latency, DLP violations, and cost. Never wonder what your AI spend is again.

Llama Strengths

  • Open-weights — full transparency and audit capability
  • Multi-provider availability (Groq, Together, DeepInfra) for cost arbitrage
  • Llama 4 Maverick: 400B MoE with vision support
  • Llama 4 Scout: 512K context window for long-document tasks
  • Fine-tuning friendly — build domain-specific models on open weights

Available Llama Models (28)

Cogito V1 Preview Llama 70B

oah/llama-3.3-70b-instruct-turbo-test
Open Source

Deploy Cogito V1 Preview Llama 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

Cogito V1 Preview Llama 70B Turbo

oah/cogito-v1-preview-llama
Open Source

Deploy Cogito V1 Preview Llama 70B Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

DeepSeek R1 Distill Llama 70B

oah/deepseek-r1-distill-llama
Open Source

Deploy DeepSeek R1 Distill Llama 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra
ReasoningInput: $0.20/MOutput: $0.60/M

meta-llama/Llama-2-7b-chat-hf

oah/llama-2-7b-chat-hf
Open Source

Deploy meta-llama/Llama-2-7b-chat-hf with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

Meta Llama 3 8B Instruct Reference

oah/llama-3-8b-chat-hf
Open Source

Deploy Meta Llama 3 8B Instruct Reference with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $0.20/MOutput: $0.20/M

Llama 3.1 405B

oah/llama-3.1
Open Source

Deploy Llama 3.1 405B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $3.50/MOutput: $3.50/M

Llama 3.2 1B

oah/llama-3.2
Open Source

Deploy Llama 3.2 1B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $0.06/MOutput: $0.06/M

Meta Llama 3.3 70B Instruct

oah/llama-3.3
Open Source

Deploy Meta Llama 3.3 70B Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra
Input: $0.13/MOutput: $0.39/M

Llama 3.3 70B Instruct FP8 Lora

oah/llama-3.3-70b-instruct-fp8-lora
Open Source

Deploy Llama 3.3 70B Instruct FP8 Lora with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

Llama 4 Maverick 17B 128E

oah/llama-4-maverick
Open Source

Deploy Llama 4 Maverick 17B 128E with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra
Input: $0.15/MOutput: $0.60/M

Llama 4 Scout (17Bx16E)

oah/llama-4-scout
Open Source

Deploy Llama 4 Scout (17Bx16E) with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra
Input: $0.15/MOutput: $0.45/M

Llama 4 Scout 17B 16E Instruct Fp8 Lora

oah/llama-4-scout-17b-16e-instruct-fp8-lora
Open Source

Deploy Llama 4 Scout 17B 16E Instruct Fp8 Lora with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

Meta Llama 3 70B Instruct Turbo

oah/meta-llama-3
Open Source

Deploy Meta Llama 3 70B Instruct Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $0.20/MOutput: $0.20/M

Meta Llama 3 8B Instruct Lite

oah/meta-llama-3-8b-instruct-lite
Open Source

Deploy Meta Llama 3 8B Instruct Lite with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $0.14/MOutput: $0.14/M

Llama 3.1 70B

oah/meta-llama-3.1
Open Source

Deploy Llama 3.1 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra
Input: $0.06/MOutput: $0.06/M

nim/meta/llama-3.1-70b-instruct

oah/meta/llama-3.1
Open Source

Deploy nim/meta/llama-3.1-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

nim/meta/llama-3.2-11b-vision-instruct

oah/meta/llama-3.2-11b-vision
Open Source

Deploy nim/meta/llama-3.2-11b-vision-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

nim/meta/llama-3.2-90b-vision-instruct

oah/meta/llama-3.2-90b-vision
Open Source

Deploy nim/meta/llama-3.2-90b-vision-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

nim/meta/llama-3.3-70b-instruct

oah/meta/llama-3.3
Open Source

Deploy nim/meta/llama-3.3-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

nim/nvidia/llama-3.1-nemotron-70b-instruct

oah/nvidia/llama-3.1-nemotron
Open Source

Deploy nim/nvidia/llama-3.1-nemotron-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

nim/nvidia/llama-3.3-nemotron-super-49b-v1

oah/nvidia/llama-3.3-nemotron-super-49b
Open Source

Deploy nim/nvidia/llama-3.3-nemotron-super-49b-v1 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

Llama 3.1 Nemotron 70B Instruct HF

oah/llama-3.1-nemotron-70b-instruct-hf
Open Source

Deploy Llama 3.1 Nemotron 70B Instruct HF with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $0.88/MOutput: $0.88/M

Meta Llama 3.1 8B Instruct Awq Int4

oah/meta-llama-3.1-8b-instruct-awq-int4
Open Source

Deploy Meta Llama 3.1 8B Instruct Awq Int4 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

NousResearch/Hermes-3-Llama-3.1-405B

oah/llama-3.1-nemotron
Open Source

Deploy NousResearch/Hermes-3-Llama-3.1-405B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $1.00/MOutput: $1.00/M

NousResearch/Hermes-3-Llama-3.1-70B

oah/hermes-3-llama-3.1
Open Source

Deploy NousResearch/Hermes-3-Llama-3.1-70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.30/MOutput: $0.30/M

meta-llama/Llama-3.2-11B-Vision-Instruct

oah/llama-3.2-11b-vision
Open Source

Deploy meta-llama/Llama-3.2-11B-Vision-Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.05/MOutput: $0.05/M

meta-llama/Llama-Guard-4-12B

oah/llama-guard-4
Open Source

Deploy meta-llama/Llama-Guard-4-12B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.18/MOutput: $0.18/M

nvidia/Llama-3.3-Nemotron-Super-49B-v1.5

oah/llama-3.3-nemotron-super-49b
Open Source

Deploy nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.10/MOutput: $0.40/M

Llama Pricing Comparison (per 1M tokens, USD)

Input / Output pricing by provider. Managed Mode adds a 25% managed markup. Pro BYOK = 0% markup.

ModelParamsContextVisionTogether.aiDeepInfra
Cogito V1 Preview Llama 70B
oah/llama-3.3-70b-instruct-turbo-test
131KNo
Cogito V1 Preview Llama 70B Turbo
oah/cogito-v1-preview-llama
131KNo
DeepSeek R1 Distill Llama 70B
oah/deepseek-r1-distill-llama
131KNo
$2.00/$2.00
$0.20/$0.60
meta-llama/Llama-2-7b-chat-hf
oah/llama-2-7b-chat-hf
4KNo
Meta Llama 3 8B Instruct Reference
oah/llama-3-8b-chat-hf
8KNo
$0.20/$0.20
Llama 3.1 405B
oah/llama-3.1
131KNo
$3.50/$3.50
Llama 3.2 1B
oah/llama-3.2
131KNo
$0.06/$0.06
Meta Llama 3.3 70B Instruct
oah/llama-3.3
131KNo
$0.88/$0.88
$0.13/$0.39
Llama 3.3 70B Instruct FP8 Lora
oah/llama-3.3-70b-instruct-fp8-lora
131KNo
Llama 4 Maverick 17B 128E
oah/llama-4-maverick
262KNo
$0.27/$0.85
$0.15/$0.60
Llama 4 Scout (17Bx16E)
oah/llama-4-scout
262KNo
$0.18/$0.59
$0.15/$0.45
Llama 4 Scout 17B 16E Instruct Fp8 Lora
oah/llama-4-scout-17b-16e-instruct-fp8-lora
10.5MNo
Meta Llama 3 70B Instruct Turbo
oah/meta-llama-3
8KNo
$0.20/$0.20
Meta Llama 3 8B Instruct Lite
oah/meta-llama-3-8b-instruct-lite
8KNo
$0.14/$0.14
Llama 3.1 70B
oah/meta-llama-3.1
131KNo
$0.18/$0.18
$0.06/$0.06
nim/meta/llama-3.1-70b-instruct
oah/meta/llama-3.1
16KNo
nim/meta/llama-3.2-11b-vision-instruct
oah/meta/llama-3.2-11b-vision
16KNo
nim/meta/llama-3.2-90b-vision-instruct
oah/meta/llama-3.2-90b-vision
16KNo
nim/meta/llama-3.3-70b-instruct
oah/meta/llama-3.3
16KNo
nim/nvidia/llama-3.1-nemotron-70b-instruct
oah/nvidia/llama-3.1-nemotron
16KNo
nim/nvidia/llama-3.3-nemotron-super-49b-v1
oah/nvidia/llama-3.3-nemotron-super-49b
16KNo
Llama 3.1 Nemotron 70B Instruct HF
oah/llama-3.1-nemotron-70b-instruct-hf
33KNo
$0.88/$0.88
Meta Llama 3.1 8B Instruct Awq Int4
oah/meta-llama-3.1-8b-instruct-awq-int4
131KNo
NousResearch/Hermes-3-Llama-3.1-405B
oah/llama-3.1-nemotron
No
$1.00/$1.00
NousResearch/Hermes-3-Llama-3.1-70B
oah/hermes-3-llama-3.1
No
$0.30/$0.30
meta-llama/Llama-3.2-11B-Vision-Instruct
oah/llama-3.2-11b-vision
No
$0.05/$0.05
meta-llama/Llama-Guard-4-12B
oah/llama-guard-4
No
$0.18/$0.18
nvidia/Llama-3.3-Nemotron-Super-49B-v1.5
oah/llama-3.3-nemotron-super-49b
No
$0.10/$0.40

Llama Direct vs AI Security Gateway

What you get at each pricing tier. Hub adds security, governance, and multi-provider routing on top of raw API access.

ModeWhat You PayPII RedactionBudget CapsRoutingAudit Trail
Direct to MetaProvider pricing onlyNoneNoneManualNone
Hub — Managed ModeProvider + 25% markup28+ PII typesPer-key hard capsSmart RouterFull compliance log
Hub — Pro BYOK ($29/mo)Direct to provider (0% markup)28+ PII typesPer-key hard capsSmart RouterFull compliance log

Popular Use Cases

1

Privacy-sensitive deployments requiring model auditability

2

Cost-optimized chatbots and customer support agents

3

Long-document summarization and analysis (Scout 512K context)

4

Multi-provider redundancy with automatic failover

Quick Integration

# pip install aisg
from aisg import AISG

client = AISG(api_key="your_hub_api_key")

response = client.chat.create(
    model="oah/llama-3.3-70b-instruct-turbo-test",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.content)
print(response.aisg_metadata.pii_detected)
print(response.aisg_metadata.cost_usd)

Use any virtual model name from the pricing table above (prefixed with oah/). Also works with the standard OpenAI SDK — just change base_url. Every request is PII-scanned before reaching Meta (Open Source).

Frequently Asked Questions

What is the Llama API pricing on AI Security Gateway?
Llama API pricing varies by provider and model variant. In Managed Mode, we add a 25% markup on top of the provider's rate. With Pro BYOK ($29/mo), you pay the provider directly at 0% markup. Our Smart Router automatically picks the cheapest available provider for each request.
Who is the cheapest Llama provider?
Llama 3.3 70B on Groq is typically the cheapest Llama provider. Our Smart Router compares real-time pricing across Groq, Together.ai, and DeepInfra and automatically routes to the cheapest one for you.
Llama 3 vs Llama 4 — which should I use?
Llama 4 Maverick (400B MoE) and Llama 4 Scout (512K context) are the latest and most capable. Llama 3.3 70B remains the best value for cost-sensitive production workloads. All versions run through AISG with identical PII protection.
Can I use my own Llama API keys with AI Security Gateway?
Yes. With Pro BYOK mode, store your Groq, Together.ai, or DeepInfra keys in AISG (AES-256 encrypted). We route requests through your account at 0% markup — you only pay the provider directly.
Does AI Security Gateway store my Llama prompts?
No. Prompts are processed in volatile memory (RAM) and discarded immediately. We never persist, log, or train on your content. Only metadata (token counts, latency, violation types) is stored.
What happens when a Llama model is retired?
Retired models automatically move to the 'Previous Versions' section on this page. If a replacement exists, it's shown alongside. Your API calls will return a clear error indicating the model is deprecated.

Previous Versions

These models have been retired by the provider. Migrate to a current variant above.

Meta Llama 3.3 70B Instruct Turbotogether
meta-llama/Meta-Llama-3-8B-Instructdeepinfra
nvidia/Llama-3.1-Nemotron-70B-Instructdeepinfra

Deploy Llama with Enterprise-Grade Security

Get started with 1,000,000 free credits. Every Llama request is PII-scanned, cost-optimized, and fully logged — zero configuration.

Not ready yet? Get notified about Llama updates:

Explore Other Model Families

Model registry last updated: . Pricing shown is the lowest available rate across providers (per 1M tokens, USD). Actual pricing depends on provider and plan.