Mistral AI · Released April 17, 2024

Mixtral 8x22B Instruct v0.1

Mixtral 8x22B is Mistral AI's larger Sparse Mixture-of-Experts model: eight experts of 22B parameters each, with two experts routed per token. The effective parameter count for compute is 39B, while total weights to load are 141B — a useful ratio for serving cost.

Why MoE

At a given inference budget a sparse model can match a much larger dense model on knowledge-recall tasks, at the price of needing more VRAM to hold the experts in memory. Mixtral popularized this tradeoff for the open-weight community.

What it's good at

Strong multilingual performance (especially French, Italian, German, Spanish), good code generation, and fluent function-calling via the v0.3 tokenizer additions. It predates the reasoning-RL trend, so chain-of-thought is something you have to prompt for rather than something the model does natively.

Running it locally

Full weights at FP16 are ~280 GB. Q4_K_M GGUF is around 86 GB and runs on a quad-3090 / dual-A6000 setup. Most users either rent a GPU or use one of the open-weight inference providers (Together, Fireworks, DeepInfra).

License

Apache 2.0 — fully permissive. This is one of the largest models in the catalog with no service-scale or competitive-use restrictions.