DeepSeek · Released January 20, 2025

DeepSeek-R1

DeepSeek-R1 was the first open-weight model to convincingly demonstrate that chain-of-thought reasoning could be elicited through pure reinforcement learning — without supervised reasoning traces. It's built on the DeepSeek-V3 base, then trained with a multi-stage RL pipeline that rewards correct final answers regardless of how the model gets there.

How it differs from V3

Where DeepSeek-V3 responds directly, R1 emits a visible <think>…</think> reasoning block before its final answer. Token counts per response can balloon — a single hard math problem may produce 5,000 tokens of internal scratchpad — but the answers are substantially more reliable on problems that require planning.

What it's good at

Mathematics (AIME, MATH benchmarks), competitive programming, and any task where the model benefits from working things out before committing. It's also a useful teaching tool because the reasoning is fully visible.

The distilled variants

DeepSeek released six smaller distilled checkpoints (Qwen-1.5B/7B/14B/32B, Llama-8B/70B) fine-tuned on R1's reasoning traces. These run on consumer hardware and inherit much of the reasoning behavior — they're often the more practical entry point.

License

MIT. Distilled variants inherit their respective base licenses (Qwen, Llama).