Llama 3.3 70B Instruct
Llama 3.3 is Meta's iterative refresh of the Llama 3 family. It keeps the same 70-billion-parameter dense transformer architecture as Llama 3.1 70B but is trained with a refined post-training recipe that closes most of the gap to the original Llama 3.1 405B at roughly one-sixth the inference cost.
What it's good at
The model is strong on general instruction-following, multilingual tasks (eight officially supported languages), and tool-call formatting. It handles long-context retrieval reliably up to its 128K-token window, although effective recall degrades past about 64K like most models in this class.
What it's not
This is not a reasoning model — there's no chain-of-thought training stage and it will happily commit to wrong answers on multi-step math problems. For that workload look at DeepSeek-R1 or a fine-tune like Llama-Nemotron.
Running it locally
Full-precision weights occupy 141 GB and won't fit on a single consumer card. The community 4-bit GGUF quants from Bartowski and Unsloth are around 40 GB and run on a pair of 24 GB cards or a single 48 GB workstation card at usable speed. Ollama users can pull llama3.3:70b.
License notes
Llama 3.3 ships under Meta's Llama 3.3 Community License — permissive for most uses but with explicit restrictions for services with more than 700 million monthly active users at the time of release. Commercial use is allowed for everyone else without a separate agreement.