deepseek: DeepSeek R1 Distill Llama 70B

deepseek-r1-distill-llama-70b
Context: 128k
Max Output: 4k
Input: $0.030/1M
Output: $0.130/1M
DeepSeek-R1-Distill-Llama-70B is a 70-billion parameter model created by distilling the reasoning capabilities of DeepSeek's flagship R1 model (671B parameters) into Meta's Llama-3.3-70B-Instruct base. It achieves exceptional performance on mathematical reasoning and coding benchmarks (94.5% on MATH-500, 1633 CodeForces rating), rivaling OpenAI's o1-mini while being fully open-source under MIT license. The model demonstrates that advanced reasoning patterns from larger models can be effectively transferred to smaller, more deployable architectures through knowledge distillation.
Input: Text
Output: Text

Providers

chutes
Credits
Context128k
Max Output4k
Input$0.030/1M
Output$0.130/1M
Cache Read
Cache Write
groq
Credits
Context128k
Max Output4k
Input$0.750/1M
Output$0.990/1M
Cache Read
Cache Write
deepinfra
Credits
Context128k
Max Output4k
Input$0.600/1M
Output$1.20/1M
Cache Read
Cache Write
openrouter
Credits
Context128k
Max Output4k
Input (Max)$2.11/1M
Output (Max)$2.11/1M
Cache Read
Cache Write

Quick Start

Use DeepSeek R1 Distill Llama 70B through Helicone's AI Gateway with automatic logging and monitoring.