Interesting feature of the Apple LLM reasoning paper.
-
Interesting feature of the Apple LLM reasoning paper. I always tell my students that exams include no irrelevant information, which gives you a clue as to the answer. LLM's have learnt this too well, and can't ignore irrelevant info (perf drops 17-70%).
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Abstract page for arXiv paper 2410.05229: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
arXiv.org (arxiv.org)
In other words: exam-style questions have massive leakage that many students don't pick up on but that LLM's find impossible to ignore. I suspect this tells us more about the way we write exam questions than anything else. They're not a good measure of LLM performance, nor human!
Copyright © 2024 NodeBB | Contributors