new blog post on whether LLMs really reason, think, summarise etc.

David Colquhoun

@UlrikeHahn @cogsci @philosophy
NO!
For scientific questions they are just auto-complete on steroids. Zero understanding. Zero critical thinking. The claim that they can summarise papers is fraudulent.

Ulrike Hahn

@david_colquhoun @cogsci @philosophy could you maybe relate that to what I actually wrote in the blog post, David?

Ulrike Hahn

@david_colquhoun I’m not yet seeing how this relates to my post, but could you elaborate more on your statement? Is it an empirical claim based on testing summarisation capabilities in some systematic way or is it a conjecture about those abilities based on an insight about “understanding” - and how do you take the issue of their “understanding” to have been settled?

David Colquhoun

@UlrikeHahn I asked two of the #AI services that are advertised on Twitter to summarise one of my papers https://onemol.org.uk/burzomato-2004.pdf
The results were very poor. So I asked it to discuss "cooperativity" in general. Results were like a poor undergrad who strings together bits that he's found on the web, with no understanding of them (which is what AI does).

Ulrike Hahn

@david_colquhoun so, if I understand, your answer to the title question "Can LLMs REALLY..summarise?" is an emphatic no because the two failed to summarise well one of your research papers. Given the good chance that the majority of the world's population also "can't really summarise" by that criterion, it feels a bit outside the normal meaning of the word "summarise" to me. And how we use words in discourse about LLMs is what my blog post is about.

Christian Luhmann

I don't have anything useful to say, but I appreciate the measured, thoughtful approach you take in this post. In a world where everyone seems to have taken (extreme) sides, it's refreshing to see.
@UlrikeHahn @cogsci @philosophy

Christian Luhmann

Ok, I semi-lied. I don't have anything useful to say, but I do have something to say. First: "you might think that psychologists sort out ‘construct validity’ before they conduct actual experiments" is hilarious in it's snark (or perceived snark). I fully approve of such snark.
@UlrikeHahn @cogsci @philosophy

Christian Luhmann

Second: "construct validity is indeed an important concern in some areas" is either further and more a sly bit of snark (i.e., it's important but no one has yet to establish it) or exceedingly generous. Given my jaded nature, I choose to interpret it as the former.
@UlrikeHahn @cogsci @philosophy

David Colquhoun

@UlrikeHahn As you'd expect from the way it works, it just strings together sciencey-sounding words that it's found on the web. The fact that it does so in reasonably goof English lends it plausibility that it doesn't deserve.

Ulrike Hahn

@1010is10 @cogsci @philosophy

a mixture of snark and generosity seems befitting of most human endeavours where a group of people collectively all give something a try....

'yes, we could absolutely do this better' at the same time as 'we've been trying for a while and it's turned out a lot harder than you might think, so there's reasons why we are where we are'

Ulrike Hahn

@david_colquhoun

"with no understanding of them (which is what AI does)" - the word 'does' there contains an ambiguity that I think makes discourse about AI confused and complicated in a very specific way.

That is what my blog post is about.

David Colquhoun

@UlrikeHahn
I guess the problem could be put this way. If I asked a scientist in a different field to summarise that paper, they would probably say "sorry, I don't understand well enough to do that", But AI just goes ahead and produces an ill-understood summary -like a 1st year undergraduate with google. It's dangerous and it's fraudulent,

Ulrike Hahn

@david_colquhoun

David, my post isn't about whether AI is dangerous or whether it's marketing is fraudulent. I think both of those things are really important, and they both hinge on what capabilities AI systems *actually have* - which is an empirical question. The point of my blog post is to try to aid discussion of that empirical question, and do so precisely because it matters.

1/2

Ulrike Hahn

@david_colquhoun 2/2 I think it matters not just because a lot depends on the answers, but also because I seem to find coming to answer a bit harder than you seem to do.

You may well be right, but nothing you have said in our exchange will help me actually come to that conclusion (and the reasons for that, I think, should also become more apparent if you read the blog post).

David Colquhoun

@UlrikeHahn
Sure. I appreciate the depth of your discussion. I take a more experimental approach, triggered by ads in X for software that claims to summarise scientific papers. That's easily tested by giving it a paper about which you have detailed technical knowledge. I expected a poor result and that's what I got. Perhaps someone can test it more thoroughly. For me, that would be a waste of time

Ulrike Hahn

@david_colquhoun 3/4 and, with all due respect, I think it’s just not intellectually serious to base one’s evaluation of anything as complex and multifaceted as both the long standing project (1) and its current implications (of which (2) itself is just one aspect) on as slender a peg as ‘I saw an advert and tried it out on something and didn’t think it performed well’. Trying it out is a great, useful, thing to do; arguing that that alone answers all questions I think is not

Ulrike Hahn

@david_colquhoun 2/4 from flogging a particular commercial product on X. It seems really important to me to not conflate one with the other.

The scientific value of LLMs, for example, is a separate question to whether I want Apple to provide LLM features on my mobile phone, as is the extent to which LLMs might transform jobs or pose future risks. All of those questions, though, involve understanding what LLMs do and don’t do. For that, we need to be able to talk about them in useful ways

Ulrike Hahn

@david_colquhoun David, from my perspective this conflates two very different things: 1) the attempt to build particular types of computational systems and their abilities 2) current attempts to market products based on such things

(2) is a recent phenomenon that has been brought about by advances in (1). (1) is part of a much longer project (going back 60 or so years) that has involved tens of thousands of researchers from multiple disciplines, with goals quite different 1/2

Ulrike Hahn

@david_colquhoun 4/4 even just for your use case of summarising scientific papers, there’s more there than a commercial product popping up on X. People have been working on machine summary of academic articles for decades, and doing so in detailed and thoughtful ways. I could find any number of unserious, shoddy datascience companies and use them to claim that “statistics sucks”. Your first response would, I expect, be to say that equating that company with ‘statistics’ is way too simplistic