X user @virattt conducted research to evaluate the performance and cost of various language models on a financial metrics calculation task.

Based on the results, @virattt proposed categorizing the language models into four tiers:

  1. Groq Tier: Extremely fast, excellent pricing, open-source models like Llama 3 (8b) and Llama 3 (70b) served by Groq Inc.
  2. Throughput Tier: Competitively priced models suitable for quick, non-critical tasks, such as Haiku, Command R, Command Light, Gemini 1.0 Pro, and GPT-3.5 Turbo.
  3. Workhorse Tier: Mid-tier pricing models stronger at complexity, great for most tasks, including DBRX Instruct, Mistral Large, Command R+, and Sonnet.
  4. Intelligence Tier: Premium, higher-priced models with the best complexity and performance for critical tasks, like Gemini 1.5 Pro, Opus, and GPT-4 Turbo.