Allen Institute for AI (AI2) · Released November 26, 2024

OLMo 2 13B Instruct

OLMo (Open Language Model) is AI2's effort to produce models that are genuinely open by every measure: weights, training data, training code, intermediate checkpoints, and evaluation harness are all published. OLMo 2 is the second generation, with 7B and 13B variants.

Why "fully open" matters

Most "open-weight" models are open in name only — the weights are released but the training data, the data filtering scripts, and the post-training mixes are proprietary. OLMo publishes all of these. If you care about model provenance, reproducibility, or being able to audit what your model was trained on, OLMo is the only serious option at this scale.

What it's good at

OLMo 2 13B benchmarks competitively with Llama 3.1 8B on most tasks. The point is less to beat the closed-data competition and more to demonstrate that fully open development can produce competitive models — and to provide a research artifact others can build on.

The Dolma training corpus

OLMo 2 was trained on the Dolmino corpus, a successor to AI2's original Dolma. It's filterable, reproducible, and explicitly licensed. If you're doing data-attribution research, this is invaluable.

License

Apache 2.0 for weights and code. Training data licensing is handled per-source and documented in the corpus release.