Coartintate: Leaderboards

Large Language Models (LLMs) are often evaluated using data similar to what they’ve seen in training. Or sometimes they may even have the test data in the training dataset without the knowledge of the people who train it. This can lead to biased results.

The Elo score compares two models directly. This method is fairer because it doesn’t rely on previously seen data. When using Elo scores, models are ranked by direct comparison. This gives us a clearer idea of how they perform against each other, rather than just by looking at test scores.

In areas like programming, Elo scores could also help us understand which models are more effective.

Links to leaderboards

LMSys Chatbot Arena Leaderboard - note that you can choose different categories (Overall, Coding, English etc.)
- Contribute your vote
EvalPlus Leaderboard evaluates AI coders
TheFastest.ai - latency benchmarks
Berkeley Function Calling Leaderboard (aka Berkeley Tool Calling Leaderboard)

Table of Contents

Backlinks

Graph View

Coartintate

Leaderboards

Links to leaderboards

Other links

Backlinks

Table of Contents

Backlinks

Graph View

Leaderboards

Links to leaderboards §

Other links §

Backlinks

Links to leaderboards

Other links