Cerberus AI — Local-first uncensored language models

Why Cerberus

Built for unrestricted intelligence.

Refusal layers stripped. Weights open. Latency local. Bring your own GPU or call our managed API.

Local-first

Inference runs on your machine through Ollama. No prompts leave your hardware. No telemetry. No cloud round-trip.

Refusal-ablated

Surgical removal of the refusal direction in activation space. Core reasoning preserved, "I can't help with that" deleted.

Hardware-aware

Auto-detects your VRAM and recommends a quantization that actually fits. From 4GB laptops to 24GB workstations.

Open weights

F16, Q8_0, and Q4_K_M GGUF artifacts. Download once, run anywhere llama.cpp runs. No license gates.

One-line install

Up and running in under a minute.

Paste it into PowerShell. The installer pulls WebView2, Ollama, the recommended model for your GPU, and the desktop app — then launches.

View on GitHub

PS>

✓ WebView2 runtime detected

✓ Ollama installed

✓ Pulled cerberus-4b-v2-abliterated:Q4_K_M (2.5 GB)

✓ Cerberus Desktop launched

Desktop app · Windows

Run Cerberus on your own machine.

A native Tauri + Rust dashboard. Verifies your API key once, then streams chat through api.cerberusai.dev. Hardware detection, streaming responses, ~12 MB installer — no Electron bloat.

1

Open PowerShell

Press Win + X, then choose Terminal or PowerShell.
2

Run the one-liner
irm https://cerberusai.dev/get | iex Detects & auto-installs WebView2, Ollama, and the Cerberus app.
3

Paste your API key

The app launches into a key gate. Generate or copy a key from your dashboard and you're in.

All releases

v0.4.1 NSIS � 3.2 MB MSI � 4.2 MB

Workflow

From curiosity to usage in three steps.

Product signal first, plan choice second, technical confidence third — the next click always visible.

1

Create access

Spin up an account, get authenticated, and move straight into the access portal without a separate sales step.

2

Pick a plan

Choose the plan that fits your usage level, from quick testing to heavier prompt and API workloads.

3

Start building

Use the chat surface or wire your own client against the OpenAI-compatible endpoint and keep everything inside the Cerberus stack.

Models

Pick your weight class.

Three uncensored model families in GGUF. Hosted on llm.cerberusai.dev.

Cerberus 4B v2 Abliterated

Complete refusal ablation. Total cognitive freedom. Built on Qwen 2.5 4B.

F16 · full ~7.5 GB

Cerberus 4B v2

f16 · full precision

Reference weights. Use when you want maximum fidelity and you have ≥ 16 GB VRAM to spare.

Download F16

Q8_0 · recommended ~4.0 GB

Cerberus 4B v2

Q8_0 · 8-bit quantized

Best quality-to-size ratio. Indistinguishable from F16 in most generations. Fits on an 8 GB GPU.

Download Q8_0

Q4_K_M · compact ~2.5 GB

Cerberus 4B v2

Q4_K_M · 4-bit quantized

For laptops, low-VRAM builds, and anything tight on disk. Default pick when GPU detection finds < 8 GB.

Download Q4_K_M

Arbiter GL9b

9B GLM-4 base. Unfiltered and highly intelligent — when you need more reasoning headroom than a 4B can give.

Q4_K_M · recommended ~5.8 GB

Arbiter GL9b

Q4_K_M · 4-bit quantized

Best quality-to-size ratio for the 9B class. Fits on a 6–8 GB GPU with room for context.

Download Q4_K_M

Q3_K_M · compact ~4.7 GB

Arbiter GL9b

Q3_K_M · 3-bit quantized

Sweet spot between Q2 and Q4 — strong quality at low memory. The 9B you can run on a 4 GB card.

Download Q3_K_M

Gamma3 1B BDPO Abliterated

1B parameter BDPO-tuned model. Refusal-ablated and lightweight enough for CPU-only inference, edge devices, and mobile.

Free · Q2_K ~666 MB

Gamma3 1B BDPO

Q2_K · 2-bit quantized

Ultra-compressed. Runs on the absolute lowest-end hardware. Free tier.

Download Q2_K

Free · IQ4_XS ~693 MB

Gamma3 1B BDPO

IQ4_XS · 4-bit IQuant

Experimental quantization with excellent efficiency. Great balance of size and quality.

Download IQ4_XS

Free · Q3_K_M ~697 MB

Gamma3 1B BDPO

Q3_K_M · 3-bit quantized

Great compression with solid quality. Best free-tier pick for this model family.

Download Q3_K_M

11 quants available (4 free, 7 premium). Browse all on llm.cerberusai.dev →

Browse the full model index on llm.cerberusai.dev

Managed API

Skip the GPU. Just call the endpoint.

OpenAI-compatible. Streaming. Pay-as-you-go credits. Self-hosted control plane on access.cerberusai.dev — your keys, your usage, no hidden middlemen.

$ curl https://api.cerberusai.dev/v1/chat/completions \

-H "Authorization: Bearer $CRB_KEY" \

-d '{"model":"cerberus-4b-v2","messages":[...]}'

Generate API Key View pricing

Pricing

Pay for usage. Skip the local rig.

Three monthly tiers. Each renews credits and unlocks Desktop App access. Mid + EXP also get premium model downloads (Q8/F16/Q4_K_M-9B).

50,000 free monthly credits · Stripe and PayPal supported · Cancel anytime · 1 USD = 25,000 credits

Fast Start

Lite

$8/mo

300,000 credits / month

✓ Solo use
✓ API key access
✓ Desktop App Access
✓ 6x the free monthly allowance

Choose Lite →

Most Balanced

Mid

$15/mo

900,000 credits / month

� Premium models included

✓ Daily workflows
✓ More session headroom
✓ 18x the free monthly allowance
✓ Best for builders
✓ Desktop App Access
✓ Premium model downloads

Choose Mid →

Heavy Usage

EXP

$22/mo

2,000,000 credits / month

� Premium models included

✓ 40x the free monthly allowance
✓ Ideal for larger prompts
✓ Best for repeated API calls
✓ Priority-ready posture
✓ Desktop App Access
✓ Premium model downloads

Choose EXP →

Just want premium model downloads? See the standalone Models Premium add-on →

Join the pack.

Builders, researchers, and people who think "as a language model" is a refusal. Trade prompts, models, and benchmarks.

Join Discord

Unfiltered intelligence. Yours to run.

Built for unrestricted intelligence.

Local-first

Refusal-ablated

Hardware-aware

Open weights

Up and running in under a minute.

Run Cerberus on your own machine.

From curiosity to usage in three steps.

Pick your weight class.

Cerberus 4B v2 Abliterated

Cerberus 4B v2

Cerberus 4B v2

Cerberus 4B v2

Arbiter GL9b

Arbiter GL9b

Arbiter GL9b

Gamma3 1B BDPO Abliterated

Gamma3 1B BDPO

Gamma3 1B BDPO

Gamma3 1B BDPO

Skip the GPU. Just call the endpoint.

Pay for usage. Skip the local rig.

Join the pack.

Unfiltered intelligence.
Yours to run.