Optimized version of Llama 3.1 8B Instruct with 128K context window, designed for high-speed inference in multilingual chat and dialogue use cases with improved throughput and efficiency.
Input: Text
Output: Text
Providers
deepinfra
Credits
Context128k
Max Output128k
Input$0.020/1M
Output$0.030/1M
Cache Read—
Cache Write—
nebius
Credits
Context128k
Max Output128k
Input$0.030/1M
Output$0.090/1M
Cache Read—
Cache Write—
Quick Start
Use Meta Llama 3.1 8B Instruct Turbo through Helicone's AI Gateway with automatic logging and monitoring.