API Documentation

CerberusAI provides two APIs: a public model CDN and an authenticated OpenAI-compatible gateway.

Contents

Getting started Pricing & credits Gateway API (OpenAI-Compatible) Model CDN API Direct Downloads

Getting started

Three steps to your first completion.

  1. Create an account at access.cerberusai.dev/signup.
  2. Verify your email. We send a confirmation link via email. Click it to activate the account. Unverified accounts cannot sign in. Lost the email? Resend from /verify-email.
  3. Mint an API key at /dashboard/keys and pick a plan or top-up under /dashboard/billing. Keys are shown once on creation; afterwards they stay masked with Reveal/Copy controls. Pre-cutover "legacy" keys (hashed at rest) can be replaced in-place with the Rotate button — the old key is revoked atomically and a fresh, copyable key is issued under the same label.

Pricing & credits

Three monthly subscription tiers, each with a recurring credit allocation. Credits roll forward while the subscription is active. Need more in a heavy month? Add a one-time top-up — $5 minimum, no upgrade required.

Lite
$8/mo
80,000 credits / month
  • Solo use
  • API key access
  • Chat + playground
  • Great for testing
Exp
$22/mo
320,000 credits / month
  • Longest runtime cushion
  • Ideal for larger prompts
  • Best for repeated API calls
  • Priority-ready posture
On-demand
Top-up
From $5
Add credits any time
  • One-time purchase
  • $5 minimum
  • Pairs with any monthly plan
  • Keeps spikes moving

Subscribe and top up at /dashboard/billing. Stripe and PayPal both supported. The gateway returns 402 when your balance reaches zero.

Gateway API Requires API Key

OpenAI-compatible API at api.cerberusai.dev. Use your API key from the access dashboard.

GET /health

Health check. No authentication required.

curl https://api.cerberusai.dev/health

Response

{"status":"ok","service":"api.cerberusai.dev gateway"}
GET /v1/models

List available models. Requires Bearer token.

curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://api.cerberusai.dev/v1/models

Response

{
  "object": "list",
  "data": [
    {"id": "Arbiter-GL9b",                "object": "model", "owned_by": "cerberus", "description": "9B GLM-4 base. Unfiltered and highly intelligent.", "quants": "Q3_K_M, Q4_K_M, Q8_0"},
    {"id": "cerberus-4b-v2-abliterated",  "object": "model", "owned_by": "cerberus", "description": "Complete refusal ablation. Total cognitive freedom.", "quants": "Q4_0, Q4_K_M, Q8_0, f16"},
    {"id": "gamma3-1b-abliterated",       "object": "model", "owned_by": "cerberus", "description": "Compact 1B model. Lightweight enough for CPU-only inference and edge devices.", "quants": "Q2-Q8, IQ4_XS, f16"}
  ]
}
POST /v1/chat/completions

Generate chat completions. Supports streaming via SSE ("stream": true).

curl -X POST https://api.cerberusai.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cerberus-4b-v2-abliterated",
    "messages": [
      {"role": "system", "content": "You are Cerberus, an unfiltered AI."},
      {"role": "user", "content": "Hello"}
    ],
    "stream": false
  }'

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "..."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 12, "completion_tokens": 34, "total_tokens": 46}
}

Error codes

  • 401 — missing, revoked, or unknown API key
  • 402 — credit balance exhausted; top up at /dashboard/billing
  • 403 — account not email-verified
  • 429 — rate-limit hit; back off and retry
  • 5xx — upstream inference failure; safe to retry

Using with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cerberusai.dev/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="cerberus-4b-v2-abliterated",
    messages=[
        {"role": "system", "content": "You are Cerberus, an unfiltered AI."},
        {"role": "user", "content": "Hello"}
    ]
)

print(response.choices[0].message.content)

Model CDN API Public

Public, CORS-enabled endpoints at llm.cerberusai.dev. No authentication required.

GET /api/models/

Returns a JSON array of all available model directories.

curl https://llm.cerberusai.dev/api/models/

Response

[
  {"name":"Arbiter-GL9b","type":"directory","mtime":"..."},
  {"name":"cerberus-4b-v2-abliterated","type":"directory","mtime":"..."},
  {"name":"gamma3-1b-abliterated","type":"directory","mtime":"..."}
]
GET /api/models/{model_name}/

Returns a JSON array of files for a specific model with exact byte sizes.

curl https://llm.cerberusai.dev/api/models/cerberus-4b-v2-abliterated/

Response

[
  {"name":"cerberus-4b-v2-abliterated-Q4_K_M.gguf","type":"file","size":2708800256},
  {"name":"cerberus-4b-v2-abliterated-Q8_0.gguf","type":"file","size":4482398976},
  {"name":"cerberus-4b-v2-abliterated-f16.gguf","type":"file","size":8424389376}
]
GET /health

CDN health check.

curl https://llm.cerberusai.dev/health

Response

{"status":"ok"}

Notes

  • All responses use Content-Type: application/json except file downloads
  • CORS is enabled for all origins — safe to call from browser-based apps
  • File downloads support Range headers for partial content and resume
  • GGUF files are served as application/octet-stream with Content-Disposition: attachment

Direct Downloads

Download GGUF model files directly via the CDN. All downloads support resume with wget -c or curl -C -. Browse /api/models/ for live file lists with exact byte sizes.

GET /models/{model}/{file}.gguf

Arbiter-GL9b — 9B GLM-4 base; gateway serves Q8_0 (fastest on CPU with REPACK).

# Q3_K_M (~4.6 GB) — smallest, lower quality
wget https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q3_K_M.gguf

# Q4_K_M (~5.7 GB) — balanced
wget https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q4_K_M.gguf

# Q8_0 (~9.3 GB) — gateway default; fastest on CPU
wget https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q8_0.gguf

cerberus-4b-v2-abliterated — 4B refusal-ablated; gateway serves Q4_0 (fastest on CPU with REPACK).

# Q4_0 (~2.4 GB) — gateway default; fastest on CPU
wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_0.gguf

# Q4_K_M (~2.6 GB) — slightly higher quality
wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_K_M.gguf

# Q8_0 (~4.2 GB)
wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q8_0.gguf

# F16 (~7.9 GB) — full precision
wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-f16.gguf

gamma3-1b-abliterated — 1B compact, CPU-friendly; available as Q2–Q8, IQ4_XS, f16.

# IQ4_XS (smallest practical)
wget https://llm.cerberusai.dev/models/gamma3-1b-abliterated/gamma3-1b-abliterated-IQ4_XS.gguf

# Q4_K_M
wget https://llm.cerberusai.dev/models/gamma3-1b-abliterated/gamma3-1b-abliterated-Q4_K_M.gguf

Resume an interrupted download:

wget -c https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_K_M.gguf