Coartintate: LLM Performance Tiers in 2024

X user @virattt conducted research to evaluate the performance and cost of various language models on a financial metrics calculation task.

Groq Tier: Extremely fast, excellent pricing, open-source models like Llama 3 (8b) and Llama 3 (70b) served by Groq Inc.
Throughput Tier: Competitively priced models suitable for quick, non-critical tasks, such as Haiku, Command R, Command Light, Gemini 1.0 Pro, and GPT-3.5 Turbo.
Workhorse Tier: Mid-tier pricing models stronger at complexity, great for most tasks, including DBRX Instruct, Mistral Large, Command R+, and Sonnet.
Intelligence Tier: Premium, higher-priced models with the best complexity and performance for critical tasks, like Gemini 1.5 Pro, Opus, and GPT-4 Turbo.

Coartintate