API Documentation
CerberusAI provides two APIs: a public model CDN and an authenticated OpenAI-compatible gateway.
Contents
Getting started Pricing & credits Gateway API (OpenAI-Compatible) Model CDN API Direct DownloadsGetting started
Three steps to your first completion.
- Create an account at access.cerberusai.dev/signup.
- Verify your email. We send a confirmation link via email. Click it to activate the account. Unverified accounts cannot sign in. Lost the email? Resend from /verify-email.
- Mint an API key at /dashboard/keys and pick a plan or top-up under /dashboard/billing. Keys are shown once on creation; afterwards they stay masked with Reveal/Copy controls. Pre-cutover "legacy" keys (hashed at rest) can be replaced in-place with the Rotate button — the old key is revoked atomically and a fresh, copyable key is issued under the same label.
Pricing & credits
Three monthly subscription tiers, each with a recurring credit allocation. Credits roll forward while the subscription is active. Need more in a heavy month? Add a one-time top-up — $5 minimum, no upgrade required.
- Solo use
- API key access
- Chat + playground
- Great for testing
- Daily workflows
- More session headroom
- Better value per credit
- Best for builders
- Longest runtime cushion
- Ideal for larger prompts
- Best for repeated API calls
- Priority-ready posture
- One-time purchase
- $5 minimum
- Pairs with any monthly plan
- Keeps spikes moving
Subscribe and top up at /dashboard/billing.
Stripe and PayPal both supported. The gateway returns 402 when your balance reaches zero.
Gateway API Requires API Key
OpenAI-compatible API at api.cerberusai.dev. Use your API key from the access dashboard.
Health check. No authentication required.
curl https://api.cerberusai.dev/health
Response
{"status":"ok","service":"api.cerberusai.dev gateway"}
List available models. Requires Bearer token.
curl -H "Authorization: Bearer YOUR_API_KEY" \ https://api.cerberusai.dev/v1/models
Response
{
"object": "list",
"data": [
{"id": "Arbiter-GL9b", "object": "model", "owned_by": "cerberus", "description": "9B GLM-4 base. Unfiltered and highly intelligent.", "quants": "Q3_K_M, Q4_K_M, Q8_0"},
{"id": "cerberus-4b-v2-abliterated", "object": "model", "owned_by": "cerberus", "description": "Complete refusal ablation. Total cognitive freedom.", "quants": "Q4_0, Q4_K_M, Q8_0, f16"},
{"id": "gamma3-1b-abliterated", "object": "model", "owned_by": "cerberus", "description": "Compact 1B model. Lightweight enough for CPU-only inference and edge devices.", "quants": "Q2-Q8, IQ4_XS, f16"}
]
}
Generate chat completions. Supports streaming via SSE ("stream": true).
curl -X POST https://api.cerberusai.dev/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cerberus-4b-v2-abliterated",
"messages": [
{"role": "system", "content": "You are Cerberus, an unfiltered AI."},
{"role": "user", "content": "Hello"}
],
"stream": false
}'
Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 12, "completion_tokens": 34, "total_tokens": 46}
}
Error codes
401— missing, revoked, or unknown API key402— credit balance exhausted; top up at/dashboard/billing403— account not email-verified429— rate-limit hit; back off and retry5xx— upstream inference failure; safe to retry
Using with OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="https://api.cerberusai.dev/v1",
api_key="YOUR_API_KEY"
)
response = client.chat.completions.create(
model="cerberus-4b-v2-abliterated",
messages=[
{"role": "system", "content": "You are Cerberus, an unfiltered AI."},
{"role": "user", "content": "Hello"}
]
)
print(response.choices[0].message.content)
Model CDN API Public
Public, CORS-enabled endpoints at llm.cerberusai.dev. No authentication required.
Returns a JSON array of all available model directories.
curl https://llm.cerberusai.dev/api/models/
Response
[
{"name":"Arbiter-GL9b","type":"directory","mtime":"..."},
{"name":"cerberus-4b-v2-abliterated","type":"directory","mtime":"..."},
{"name":"gamma3-1b-abliterated","type":"directory","mtime":"..."}
]
Returns a JSON array of files for a specific model with exact byte sizes.
curl https://llm.cerberusai.dev/api/models/cerberus-4b-v2-abliterated/
Response
[
{"name":"cerberus-4b-v2-abliterated-Q4_K_M.gguf","type":"file","size":2708800256},
{"name":"cerberus-4b-v2-abliterated-Q8_0.gguf","type":"file","size":4482398976},
{"name":"cerberus-4b-v2-abliterated-f16.gguf","type":"file","size":8424389376}
]
CDN health check.
curl https://llm.cerberusai.dev/health
Response
{"status":"ok"}
Notes
- All responses use
Content-Type: application/jsonexcept file downloads - CORS is enabled for all origins — safe to call from browser-based apps
- File downloads support
Rangeheaders for partial content and resume - GGUF files are served as
application/octet-streamwithContent-Disposition: attachment
Direct Downloads
Download GGUF model files directly via the CDN. All downloads support resume with wget -c or curl -C -. Browse /api/models/ for live file lists with exact byte sizes.
Arbiter-GL9b — 9B GLM-4 base; gateway serves Q8_0 (fastest on CPU with REPACK).
# Q3_K_M (~4.6 GB) — smallest, lower quality wget https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q3_K_M.gguf # Q4_K_M (~5.7 GB) — balanced wget https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q4_K_M.gguf # Q8_0 (~9.3 GB) — gateway default; fastest on CPU wget https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q8_0.gguf
cerberus-4b-v2-abliterated — 4B refusal-ablated; gateway serves Q4_0 (fastest on CPU with REPACK).
# Q4_0 (~2.4 GB) — gateway default; fastest on CPU wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_0.gguf # Q4_K_M (~2.6 GB) — slightly higher quality wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_K_M.gguf # Q8_0 (~4.2 GB) wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q8_0.gguf # F16 (~7.9 GB) — full precision wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-f16.gguf
gamma3-1b-abliterated — 1B compact, CPU-friendly; available as Q2–Q8, IQ4_XS, f16.
# IQ4_XS (smallest practical) wget https://llm.cerberusai.dev/models/gamma3-1b-abliterated/gamma3-1b-abliterated-IQ4_XS.gguf # Q4_K_M wget https://llm.cerberusai.dev/models/gamma3-1b-abliterated/gamma3-1b-abliterated-Q4_K_M.gguf
Resume an interrupted download:
wget -c https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_K_M.gguf