Mixtral 8x22B Instruct v0.1
Mixtral 8x22B is Mistral AI's larger Sparse Mixture-of-Experts model: eight experts of 22B parameters each, with two experts routed per token. The effective parameter count for compute is 39B, while total weights to load are 141B — a useful ratio for serving cost.
Why MoE
At a given inference budget a sparse model can match a much larger dense model on knowledge-recall tasks, at the price of needing more VRAM to hold the experts in memory. Mixtral popularized this tradeoff for the open-weight community.
What it's good at
Strong multilingual performance (especially French, Italian, German, Spanish), good code generation, and fluent function-calling via the v0.3 tokenizer additions. It predates the reasoning-RL trend, so chain-of-thought is something you have to prompt for rather than something the model does natively.
Running it locally
Full weights at FP16 are ~280 GB. Q4_K_M GGUF is around 86 GB and runs on a quad-3090 / dual-A6000 setup. Most users either rent a GPU or use one of the open-weight inference providers (Together, Fireworks, DeepInfra).
License
Apache 2.0 — fully permissive. This is one of the largest models in the catalog with no service-scale or competitive-use restrictions.