DeepSeek-R1
DeepSeek-R1 was the first open-weight model to convincingly demonstrate that chain-of-thought reasoning could be elicited through pure reinforcement learning — without supervised reasoning traces. It's built on the DeepSeek-V3 base, then trained with a multi-stage RL pipeline that rewards correct final answers regardless of how the model gets there.
How it differs from V3
Where DeepSeek-V3 responds directly, R1 emits a visible <think>…</think> reasoning block before its final answer. Token counts per response can balloon — a single hard math problem may produce 5,000 tokens of internal scratchpad — but the answers are substantially more reliable on problems that require planning.
What it's good at
Mathematics (AIME, MATH benchmarks), competitive programming, and any task where the model benefits from working things out before committing. It's also a useful teaching tool because the reasoning is fully visible.
The distilled variants
DeepSeek released six smaller distilled checkpoints (Qwen-1.5B/7B/14B/32B, Llama-8B/70B) fine-tuned on R1's reasoning traces. These run on consumer hardware and inherit much of the reasoning behavior — they're often the more practical entry point.
License
MIT. Distilled variants inherit their respective base licenses (Qwen, Llama).