Update on my ongoing struggle to bring some semblence of reality to the prompt engineering article on Wikipedia

Cassandra Granade 🏳️‍⚧️

Update on my ongoing struggle to bring some semblence of reality to the prompt engineering article on Wikipedia:

I finally found a peer-reviewed article supporting the wild claims made on that page! But it was peer reviewed by a conference on prompt engineering with a PC comprised almost entirely of AI vendors.

So yeah.

Cassandra Granade 🏳️‍⚧️

That's not automatically invalid, of course. Specialist fields can and should have their own venues, but it definitely raises a lot of flags in such a new field when a conference is mostly run by vendors.

The knowledge basis in this case is mostly written by individuals with financial stake in positive results and peer-reviewed by colleagues with similar incentives.

That's not wonderful.

Cassandra Granade 🏳️‍⚧️

I'll make a comparison here to how quantum computing has evolved in academic publishing. At the outset, it seems fairly similar: a lot of preprints in a specialized field, with what peer reviewed work there is being largely reviewed by others in the field, and often by those at companies with financial stake in the outcome.

But. There are some differences that point to why research into what LLMs can do once trained is meaningfully different.

Cassandra Granade 🏳️‍⚧️

First, quantum computing has been around as an academic discipline since the mid to late 80s. That gave a lot of time to build up evidentiary standards, basic results, theoretical frameworks, and so forth well before commercialization. Many of those results did come from corporate research groups, but before there was any specific product and specific conflict of interest.

Cassandra Granade 🏳️‍⚧️

Second, to a large degree, the foundational research in quantum computing that is peer-reviewed is published in pre-existing journals that had their own pre-existing evidentiary standards. Those standards are wildly imperfect! But they exist, at least.

Specialized journals can then build off that foundational basis and provide concrete provenance back to venues not primarily driven by financial interest in positive results.

Cassandra Granade 🏳️‍⚧️

Third, the evidentiary standards in quantum computing have weakened as more and more publishing venues are effectively controlled by groups with specific and concrete conflicts of interest — this is especially prominent when considering "NISQ" and "VQE" research, which is too often driven by companies trying to find ways to sell non–fault-tolerant devices. While some useful research exists in those areas, standards have been damaged, and it's instructive to consider how that happened.

Cassandra Granade 🏳️‍⚧️

Fourth, there exist a number of prominent research groups, mostly in universities and government labs, that have either no specific financial interest in positive results, or that actively a significant financial interest in *negative* results. That provides some balance that can help moderate the worst conflicts of interest.

I'm not aware of more than a few specific grants and programs that are funded to be critical of AI hype.

Cassandra Granade 🏳️‍⚧️

Science is imperfect at the best of times, and even more so when considering academic publishing companies and incentives. Treating that as the sole arbiter of truth is dangerous as hell.

At the same time, that's not justification for adding *more* conflicts of interest, reducing evidentiary standards even further, and institutionalizing shoddy empirical methods.

Cassandra Granade 🏳️‍⚧️

If I had time and support and didn't have better things to do, I'd definitely want to go actually do some of the "green cheese" disproofs and help put that whole nonsense on a firmer epistemological basis.

As it is, though, I think intense skepticism and incredulity is warranted for *any* paper in the LLM space, given the problems mentioned above.