Research

Intelligence density, not parameter count.

Frontier reasoning doesn't require trillion-parameter models. It requires getting more reasoning out of every active parameter — then compressing what's left to run anywhere.

Industry validation

NVIDIA Nemotron Cascade 2

A 30B model that activates just 3B parameters per token — and beats NVIDIA's own 120B on code and math, running on a single RTX 4090.

Its post-training recipe — Cascade RL and on-policy distillation — is the same family that powers our pipeline. As frontier labs open these techniques, we absorb them and compress further, for hardware where the cloud isn't an option.

AIME 2025

92.4

Math reasoning

LiveCodeBench v6

87.2

Code generation

IMO 2025

Gold

35 points · competition math

Activation ratio

3B / 30B

10% active per token

Directions

How we compress without losing the reasoning.

Knowledge Distillation

Transfer behavior from 70B+ teacher models into sub-3B students via on-policy distillation.

Quantization

INT8/INT4 quantization-aware training — less memory, faster edge compute, intact accuracy.

Weight Pruning

Remove non-critical connections to enforce sparsity and accelerate matrix multiplications.

Reinforcement Learning

Reward modeling and policy optimization to teach spatial and persona reasoning.