But that's precisely the kind of rigor I don't see out of research into what LLMs are capable of.
-
But that's precisely the kind of rigor I don't see out of research into what LLMs are capable of. Why is it that we should expect that adding "let's think step-by-step" to a prompt should materially change the results? Why is that the right thing to test, *and what do we learn about LLMs* by having done so?
No single paper needs to answer all of that, that's an unreasonable bar, but without those answers, a strong claim about the underlying truth of how LLMs work is unjustified.
-
Cassandra Granade 🏳️⚧️replied to Cassandra Granade 🏳️⚧️ last edited by
As a tangent, I blame a lot of the poor epistemology currently undermining our understanding of the capabilities and limitations of LLMs on decades of systemic problems in academic publishing that are now being callously exploited by marketing departments.
For ages, papers needed to make bombastic positive claims to get prestige, there has been an active reproducibility crisis, and there's been very little incentive for peer reviewers to adequately test and find flaws with methodologies.
-
Cassandra Granade 🏳️⚧️replied to Cassandra Granade 🏳️⚧️ last edited by
In many fields, these effects together with shoddy statistics that do not properly incorporate background information gives you stuff like the infamous "clowns help with IVF" paper¹. Those effects can be dangerous as hell, as they can corrupt medical knowledge needed to offer safe therapeutic prescriptions.
What's happening in "AI," though, appears to be another step still of exploiting those systemic effects to turn academic publishing into PR.