Cerberus CERBERUS AI
Menu
Local · Uncensored · Open

Unfiltered intelligence.
Yours to run.

Open-weight language models, refusal-ablated and tuned to run on your own hardware. Desktop app for instant local chat. Self-hosted models. A managed API when you need it.

Join the pack on Discord
Cerberus
Why Cerberus

Built for unrestricted intelligence.

Refusal layers stripped. Weights open. Latency local. Bring your own GPU or call our managed API.

Local-first

Inference runs on your machine through Ollama. No prompts leave your hardware. No telemetry. No cloud round-trip.

Refusal-ablated

Surgical removal of the refusal direction in activation space. Core reasoning preserved, "I can't help with that" deleted.

Hardware-aware

Auto-detects your VRAM and recommends a quantization that actually fits. From 4GB laptops to 24GB workstations.

Open weights

F16, Q8_0, and Q4_K_M GGUF artifacts. Download once, run anywhere llama.cpp runs. No license gates.

One-line install

Up and running in under a minute.

Paste it into PowerShell. The installer pulls WebView2, Ollama, the recommended model for your GPU, and the desktop app — then launches.

PS>  
✓ WebView2 runtime detected
✓ Ollama installed
✓ Pulled cerberus-4b-v2-abliterated:Q4_K_M (2.5 GB)
✓ Cerberus Desktop launched
Desktop app · Windows

Run Cerberus on your own machine.

A native Tauri + Rust dashboard. Verifies your API key once, then streams chat through api.cerberusai.dev. Hardware detection, streaming responses, ~12 MB installer — no Electron bloat.

  1. 1
    Open PowerShell
    Press Win + X, then choose Terminal or PowerShell.
  2. 2
    Run the one-liner
    irm https://cerberusai.dev/get | iex Detects & auto-installs WebView2, Ollama, and the Cerberus app.
  3. 3
    Paste your API key
    The app launches into a key gate. Generate or copy a key from your dashboard and you're in.
v0.4.1 NSIS · 3.2 MB MSI · 4.2 MB
Cerberus Desktop App
Workflow

From curiosity to usage in three steps.

Product signal first, plan choice second, technical confidence third — the next click always visible.

1
Create access

Spin up an account, get authenticated, and move straight into the access portal without a separate sales step.

2
Pick a plan

Choose the plan that fits your usage level, from quick testing to heavier prompt and API workloads.

3
Start building

Use the chat surface or wire your own client against the OpenAI-compatible endpoint and keep everything inside the Cerberus stack.

Models

Pick your weight class.

Three uncensored model families in GGUF. Hosted on llm.cerberusai.dev.

Cerberus 4B v2 Abliterated

Complete refusal ablation. Total cognitive freedom. Built on Qwen 2.5 4B.

F16 · full ~7.5 GB

Cerberus 4B v2

f16 · full precision

Reference weights. Use when you want maximum fidelity and you have ≥ 16 GB VRAM to spare.

Download F16
Q8_0 · recommended ~4.0 GB

Cerberus 4B v2

Q8_0 · 8-bit quantized

Best quality-to-size ratio. Indistinguishable from F16 in most generations. Fits on an 8 GB GPU.

Download Q8_0
Q4_K_M · compact ~2.5 GB

Cerberus 4B v2

Q4_K_M · 4-bit quantized

For laptops, low-VRAM builds, and anything tight on disk. Default pick when GPU detection finds < 8 GB.

Download Q4_K_M

Arbiter GL9b

9B GLM-4 base. Unfiltered and highly intelligent — when you need more reasoning headroom than a 4B can give.

Q4_K_M · recommended ~5.8 GB

Arbiter GL9b

Q4_K_M · 4-bit quantized

Best quality-to-size ratio for the 9B class. Fits on a 6–8 GB GPU with room for context.

Download Q4_K_M
Q3_K_M · compact ~4.7 GB

Arbiter GL9b

Q3_K_M · 3-bit quantized

Sweet spot between Q2 and Q4 — strong quality at low memory. The 9B you can run on a 4 GB card.

Download Q3_K_M

Gamma3 1B BDPO Abliterated

1B parameter BDPO-tuned model. Refusal-ablated and lightweight enough for CPU-only inference, edge devices, and mobile.

Free · Q2_K ~666 MB

Gamma3 1B BDPO

Q2_K · 2-bit quantized

Ultra-compressed. Runs on the absolute lowest-end hardware. Free tier.

Download Q2_K
Free · IQ4_XS ~693 MB

Gamma3 1B BDPO

IQ4_XS · 4-bit IQuant

Experimental quantization with excellent efficiency. Great balance of size and quality.

Download IQ4_XS
Free · Q3_K_M ~697 MB

Gamma3 1B BDPO

Q3_K_M · 3-bit quantized

Great compression with solid quality. Best free-tier pick for this model family.

Download Q3_K_M

11 quants available (4 free, 7 premium). Browse all on llm.cerberusai.dev →

Managed API

Skip the GPU. Just call the endpoint.

OpenAI-compatible. Streaming. Pay-as-you-go credits. Self-hosted control plane on access.cerberusai.dev — your keys, your usage, no hidden middlemen.

$ curl https://api.cerberusai.dev/v1/chat/completions \
-H "Authorization: Bearer $CRB_KEY" \
-d '{"model":"cerberus-4b-v2","messages":[...]}'
Pricing

Pay for usage. Skip the local rig.

Three monthly tiers. Each renews credits and unlocks Desktop App access. Mid + EXP also get premium model downloads (Q8/F16/Q4_K_M-9B).

50,000 free monthly credits · Stripe and PayPal supported · Cancel anytime · 1 USD = 25,000 credits

Fast Start
Lite
$8/mo
300,000 credits / month
  • ✓ Solo use
  • ✓ API key access
  • ✓ Desktop App Access
  • ✓ 6x the free monthly allowance
Choose Lite →
Most Balanced
Most Balanced
Mid
$15/mo
900,000 credits / month
☠Premium models included
  • ✓ Daily workflows
  • ✓ More session headroom
  • ✓ 18x the free monthly allowance
  • ✓ Best for builders
  • ✓ Desktop App Access
  • ✓ Premium model downloads
Choose Mid →
Heavy Usage
EXP
$22/mo
2,000,000 credits / month
☠Premium models included
  • ✓ 40x the free monthly allowance
  • ✓ Ideal for larger prompts
  • ✓ Best for repeated API calls
  • ✓ Priority-ready posture
  • ✓ Desktop App Access
  • ✓ Premium model downloads
Choose EXP →

Join the pack.

Builders, researchers, and people who think "as a language model" is a refusal. Trade prompts, models, and benchmarks.

Join Discord
We accept
AMEX

Secure subscriptions and one-time top-ups via Stripe and PayPal. Card processing, refunds, and storage handled by the payment provider.