
AI | Monetary Policy & Inflation
AI Reflections: Which LLM Performs Best? Our Benchmark Says Fine-Tuning Wins
Dalvir Mandara, Eric Wang, Bilal Hafeez
Not a day goes by without a new LLM model hitting the market, each claiming to outperform the rest on some benchmark leaderboard. But these self-reported scores often deserve scepticism. Much like marking your own homework, these evaluations can be misleading. For starters, many benchmark datasets – or even their answers – might appear in […]