Cleonlab

On-device AI coding.
No cloud. No limits.

A complete coding agent that executes entirely on your machine. No API calls. No usage caps.

Your hardware, your rulesNo tokens, no limitsSpecialized SLMUnbounded contextZero telemetryNative inference100% offlineYour hardware, your rulesNo tokens, no limitsSpecialized SLMUnbounded contextZero telemetryNative inference100% offline
The Problem

You don't own your AI.
And you're being watched.

Monitoring active

Data Extraction

001

They train on your code.

Every prompt. Every file. Every fix. It flows through infrastructure you don't control, improving systems they want to use to replace you.

Artificial Scarcity

002

They meter your ambition.

Slowdowns, overages, caps. Right when you're deep in a sprint, the meter decides you've had enough.

Silent Downgrades

003

They change the model.

They silently downgrade to cheaper models during peak load. Full price, degraded experience.

Cloud Dependency

004

They control your flow.

Every completion makes a round trip across the internet. Thousands of tiny interruptions, every single day.

Introducing RIG

Everything local.
Own your AI.

A complete AI coding agent running entirely on your own hardware. No usage limits. No cloud dependency.

Cloud

Your Machine

Your Code

Keystrokes · Files

RIG

Local Inference

GPUIndexModel

Response

<300ms · On Device

Telemetry
Cloud Servers
Severed

Your Machine

RIG Model Active

Severed
Nothing Leaves
Offline

Work offline

Flights. Spotty Wi-Fi. Network outages. Nothing stops your flow.

Unlimited

Remove the meter

Refactor the whole codebase. Riff on an idea all day. Run agent loops without thinking about cost.

Privacy

Sever the connection

Your code, keystrokes, and files never leave your machine. Not anonymized. Not aggregated. Not sent.

Latency

Stop waiting

No round-trip to a data center. Inference happens on your machine, in single-digit milliseconds.

Our Approach

Purpose beats scale.

Rig is a closed system, model, context, tools, and inference, engineered together for one job: real coding work.

Step 01

A focused model, trained specifically for coding.

Step 02

Full intelligence, compressed to fit your machine.

The model is compressed to run efficiently on consumer machines, carefully preserving the reasoning patterns that matter most.

The result is an 8 GB model that fits comfortably in memory on a MacBook. Full reasoning. Local execution. Zero cost per token.

Step 03

A custom runtime, engineered for Apple Silicon.

Model Size

Model size (memory required)

Cloud models200+ GB
Open source28-140 GB
Rig8 GB

Fits in 16 GB unified memory.

Accuracy loss: <0.3%

Capabilities

Your machine, unleashed.

[ 01 ]

Understands your architecture.

Builds a connected model of modules, dependencies, and relationships so reasoning happens across files and aligns with your architecture.

[ 02 ]

Tracks relationships, prevents breakage.

Edits that respect function contracts, type boundaries, and dependency graphs, reducing bugs and regressions.

[ 03 ]

Strategizes before acting.

Explore -> Plan -> Execute workflows ensure multiple steps are reasoned out before changes occur.

[ 04 ]

Executes complex coding workflows.

From refactors to test generation to feature builds, coordinate tools, code edits, web search, and commands as needed.

[ 05 ]

Isolates agent sandboxes.

Each agent runs in its own workspace so experiments are safe, parallel workflows don't clash, and code changes stay isolated until you merge them.

[ 06 ]

Runs at full speed.

Custom Rust inference engine optimized for CUDA and Metal, delivering up to 144 tokens per second on consumer hardware.

Latency

0ms

No round-trip required

Privacy

100%

Air-gapped by design

Cost / Token

$0

Your GPU, your tokens

Uptime

Local

No dependency on cloud

Engineered Intelligence

Built for control freaks

rig://localhost · offline

λ rig init

RIG

> Scanning hardware...

> Found M4 · 16GB RAM

> Loading RIG Model OK

> Indexing 2,418 files · 87,102 symbols

Ready. Network OFF Telemetry OFF

λ find the memory

Neural EngineRG-800Local Ops

Custom Model

Optimized for consumer hardware

Inference

Cross-OS using Rust

Context Graph

Repo-wide code understanding

Terminal UI

Built in Rust and blazing fast

Heavily Tuned

Consistent tool calls and plan use

Opinionated

Focused on code correctness

Early Access

Rig is almost ready.

We're inviting engineers to run it on real code and help shape what ships.