Interesting feature of the Apple LLM reasoning paper.

Dan Goodman

Interesting feature of the Apple LLM reasoning paper. I always tell my students that exams include no irrelevant information, which gives you a clue as to the answer. LLM's have learnt this too well, and can't ignore irrelevant info (perf drops 17-70%).

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Abstract page for arXiv paper 2410.05229: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

arXiv.org (arxiv.org)

In other words: exam-style questions have massive leakage that many students don't pick up on but that LLM's find impossible to ignore. I suspect this tells us more about the way we write exam questions than anything else. They're not a good measure of LLM performance, nor human!