Cogito V1 Preview Llama 70B
oah/llama-3.3-70b-instruct-turbo-testDeploy Cogito V1 Preview Llama 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
by Meta (Open Source)
Meta's open-weights Llama family is the most widely deployed open-source LLM series. Compare Llama API pricing across Groq, Together, and DeepInfra to find the cheapest Llama provider. Llama 4 introduced mixture-of-experts (Maverick) and a long-context variant (Scout), while Llama 3.3 remains a cost-efficient workhorse for production workloads.
Every Llama request is scanned for 28+ PII entity types — SSNs, credit cards, emails, API keys, and more — before it reaches any provider.
Llama is available across 2 providers. Our Smart Router picks the cheapest one per-request. 25% managed markup / 0% on Pro BYOK.
Use the AISG SDK (pip install aisg) for typed metadata and error handling, or change two lines in your OpenAI SDK. Both work.
Per-request logging of token counts, latency, DLP violations, and cost. Never wonder what your AI spend is again.
oah/llama-3.3-70b-instruct-turbo-testDeploy Cogito V1 Preview Llama 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/cogito-v1-preview-llamaDeploy Cogito V1 Preview Llama 70B Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/deepseek-r1-distill-llamaDeploy DeepSeek R1 Distill Llama 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-2-7b-chat-hfDeploy meta-llama/Llama-2-7b-chat-hf with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3-8b-chat-hfDeploy Meta Llama 3 8B Instruct Reference with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3.1Deploy Llama 3.1 405B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3.2Deploy Llama 3.2 1B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3.3Deploy Meta Llama 3.3 70B Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3.3-70b-instruct-fp8-loraDeploy Llama 3.3 70B Instruct FP8 Lora with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-4-maverickDeploy Llama 4 Maverick 17B 128E with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-4-scoutDeploy Llama 4 Scout (17Bx16E) with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-4-scout-17b-16e-instruct-fp8-loraDeploy Llama 4 Scout 17B 16E Instruct Fp8 Lora with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/meta-llama-3Deploy Meta Llama 3 70B Instruct Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/meta-llama-3-8b-instruct-liteDeploy Meta Llama 3 8B Instruct Lite with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/meta-llama-3.1Deploy Llama 3.1 70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/meta/llama-3.1Deploy nim/meta/llama-3.1-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/meta/llama-3.2-11b-visionDeploy nim/meta/llama-3.2-11b-vision-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/meta/llama-3.2-90b-visionDeploy nim/meta/llama-3.2-90b-vision-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/meta/llama-3.3Deploy nim/meta/llama-3.3-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/nvidia/llama-3.1-nemotronDeploy nim/nvidia/llama-3.1-nemotron-70b-instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/nvidia/llama-3.3-nemotron-super-49bDeploy nim/nvidia/llama-3.3-nemotron-super-49b-v1 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3.1-nemotron-70b-instruct-hfDeploy Llama 3.1 Nemotron 70B Instruct HF with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/meta-llama-3.1-8b-instruct-awq-int4Deploy Meta Llama 3.1 8B Instruct Awq Int4 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3.1-nemotronDeploy NousResearch/Hermes-3-Llama-3.1-405B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/hermes-3-llama-3.1Deploy NousResearch/Hermes-3-Llama-3.1-70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3.2-11b-visionDeploy meta-llama/Llama-3.2-11B-Vision-Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-guard-4Deploy meta-llama/Llama-Guard-4-12B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
oah/llama-3.3-nemotron-super-49bDeploy nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.
Input / Output pricing by provider. Managed Mode adds a 25% managed markup. Pro BYOK = 0% markup.
| Model | Params | Context | Vision | Together.ai | DeepInfra |
|---|---|---|---|---|---|
Cogito V1 Preview Llama 70B oah/llama-3.3-70b-instruct-turbo-test | — | 131K | No | — | — |
Cogito V1 Preview Llama 70B Turbo oah/cogito-v1-preview-llama | — | 131K | No | — | — |
DeepSeek R1 Distill Llama 70B oah/deepseek-r1-distill-llama | — | 131K | No | $2.00/$2.00 | $0.20/$0.60 |
meta-llama/Llama-2-7b-chat-hf oah/llama-2-7b-chat-hf | — | 4K | No | — | — |
Meta Llama 3 8B Instruct Reference oah/llama-3-8b-chat-hf | — | 8K | No | $0.20/$0.20 | — |
Llama 3.1 405B oah/llama-3.1 | — | 131K | No | $3.50/$3.50 | — |
Llama 3.2 1B oah/llama-3.2 | — | 131K | No | $0.06/$0.06 | — |
Meta Llama 3.3 70B Instruct oah/llama-3.3 | — | 131K | No | $0.88/$0.88 | $0.13/$0.39 |
Llama 3.3 70B Instruct FP8 Lora oah/llama-3.3-70b-instruct-fp8-lora | — | 131K | No | — | — |
Llama 4 Maverick 17B 128E oah/llama-4-maverick | — | 262K | No | $0.27/$0.85 | $0.15/$0.60 |
Llama 4 Scout (17Bx16E) oah/llama-4-scout | — | 262K | No | $0.18/$0.59 | $0.15/$0.45 |
Llama 4 Scout 17B 16E Instruct Fp8 Lora oah/llama-4-scout-17b-16e-instruct-fp8-lora | — | 10.5M | No | — | — |
Meta Llama 3 70B Instruct Turbo oah/meta-llama-3 | — | 8K | No | $0.20/$0.20 | — |
Meta Llama 3 8B Instruct Lite oah/meta-llama-3-8b-instruct-lite | — | 8K | No | $0.14/$0.14 | — |
Llama 3.1 70B oah/meta-llama-3.1 | — | 131K | No | $0.18/$0.18 | $0.06/$0.06 |
nim/meta/llama-3.1-70b-instruct oah/meta/llama-3.1 | — | 16K | No | — | — |
nim/meta/llama-3.2-11b-vision-instruct oah/meta/llama-3.2-11b-vision | — | 16K | No | — | — |
nim/meta/llama-3.2-90b-vision-instruct oah/meta/llama-3.2-90b-vision | — | 16K | No | — | — |
nim/meta/llama-3.3-70b-instruct oah/meta/llama-3.3 | — | 16K | No | — | — |
nim/nvidia/llama-3.1-nemotron-70b-instruct oah/nvidia/llama-3.1-nemotron | — | 16K | No | — | — |
nim/nvidia/llama-3.3-nemotron-super-49b-v1 oah/nvidia/llama-3.3-nemotron-super-49b | — | 16K | No | — | — |
Llama 3.1 Nemotron 70B Instruct HF oah/llama-3.1-nemotron-70b-instruct-hf | — | 33K | No | $0.88/$0.88 | — |
Meta Llama 3.1 8B Instruct Awq Int4 oah/meta-llama-3.1-8b-instruct-awq-int4 | — | 131K | No | — | — |
NousResearch/Hermes-3-Llama-3.1-405B oah/llama-3.1-nemotron | — | — | No | — | $1.00/$1.00 |
NousResearch/Hermes-3-Llama-3.1-70B oah/hermes-3-llama-3.1 | — | — | No | — | $0.30/$0.30 |
meta-llama/Llama-3.2-11B-Vision-Instruct oah/llama-3.2-11b-vision | — | — | No | — | $0.05/$0.05 |
meta-llama/Llama-Guard-4-12B oah/llama-guard-4 | — | — | No | — | $0.18/$0.18 |
nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 oah/llama-3.3-nemotron-super-49b | — | — | No | — | $0.10/$0.40 |
What you get at each pricing tier. Hub adds security, governance, and multi-provider routing on top of raw API access.
| Mode | What You Pay | PII Redaction | Budget Caps | Routing | Audit Trail |
|---|---|---|---|---|---|
| Direct to Meta | Provider pricing only | None | None | Manual | None |
| Hub — Managed Mode | Provider + 25% markup | 28+ PII types | Per-key hard caps | Smart Router | Full compliance log |
| Hub — Pro BYOK ($29/mo) | Direct to provider (0% markup) | 28+ PII types | Per-key hard caps | Smart Router | Full compliance log |
Privacy-sensitive deployments requiring model auditability
Cost-optimized chatbots and customer support agents
Long-document summarization and analysis (Scout 512K context)
Multi-provider redundancy with automatic failover
# pip install aisg
from aisg import AISG
client = AISG(api_key="your_hub_api_key")
response = client.chat.create(
model="oah/llama-3.3-70b-instruct-turbo-test",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.content)
print(response.aisg_metadata.pii_detected)
print(response.aisg_metadata.cost_usd)Use any virtual model name from the pricing table above (prefixed with oah/). Also works with the standard OpenAI SDK — just change base_url. Every request is PII-scanned before reaching Meta (Open Source).
These models have been retired by the provider. Migrate to a current variant above.
Get started with 1,000,000 free credits. Every Llama request is PII-scanned, cost-optimized, and fully logged — zero configuration.
Not ready yet? Get notified about Llama updates:
OpenAI's GPT family powers the majority of commercial AI applications. Compare GPT-4 API cost and OpenAI API pricing acr…
Google's Gemini family offers powerful multimodal capabilities with large context windows. Compare Gemini API pricing an…
Anthropic's Claude family is built with safety and reliability at its core. Compare Claude API pricing and Claude Sonnet…
DeepSeek has rapidly risen as a leading open-source model family, known for exceptional coding performance and cost effi…
Mistral AI's model family spans from compact open-weights models to powerful commercial variants. Compare Mistral API pr…
Model registry last updated: . Pricing shown is the lowest available rate across providers (per 1M tokens, USD). Actual pricing depends on provider and plan.