I finally turned off GitHub Copilot yesterday.
-
David Chisnall (*Now with 50% more sarcasm!*)replied to Mx Autumn :blobcatpumpkin: last edited by
@carbontwelve I used machine learning in my PhD. The use case there was data prefetching. This was an ideal task for ML, because the benefits of a correct answer were high and the cost of an incorrect answer were low. In the worst case, your prefetching evicts something from cache that you need later, but a 60% accuracy in predictions is a big overall improvement.
Programming is the opposite. The benefits of being able to generate correct code faster 80% of the time are small but the costs of generating incorrect code even 1% of the time are high. The entire shift-left movement is about finding and preventing bugs earlier.
-
-
Alaric Snell-Pymreplied to Mx Autumn :blobcatpumpkin: last edited by
@carbontwelve @david_chisnall this also matches my expectations, and I've seen people mention studies in teams showing no productivity gain, too.
So I'm intrigued by the few people who DO report that LLMs help them code, though (eg @simon ). Is there something different about how their brains work so LLMs help? Or (cynically) are they jumping on the bandwagon and trying hard to show the world they've cracked how to use them well, to sell themselves as consultants or something?
-
@kitten_tech @carbontwelve @david_chisnall I'm actually getting more coding work done directly in the Claude and ChatGPT web interfaces and apps vs using Copliot in my editor
The real magic for me at the moment is Claude Artifacts and ChatGPT Code Interpreter - I wrote a bunch about Artifacts here: https://simonwillison.net/tags/claude-artifacts/
Here are all of my general notes on AI-assisted programming: https://simonwillison.net/tags/ai-assisted-programming/
-
David Clarke :tinoflag:replied to David Chisnall (*Now with 50% more sarcasm!*) last edited by
@david_chisnall @carbontwelve this is what has been gnawing at the back of my brain. The purveyors of LLM's have been talking up the latest improvements in reasoning. A calculator that isn't 100% accurate at returning correct answers to inputs is 100% useless. We're being asked to conflate the utility of LLM's with the same kind of utility as a calculator. Would we choose to drive over a bridge designed using AI? How will we know?
-
David Chisnall (*Now with 50% more sarcasm!*)replied to David Clarke :tinoflag: last edited by
@zebratale @carbontwelve Calculators do make mistakes. Most pocket calculators do arithmetic in binary and so propagate errors converting decimal to binary floating point, for example not being able to represent 0.1 accurately. They use floating point to approximate rationals, so collect rounding errors for things like 1/3.
The difference is that you can create a mental model of how they fail and make sure that the inaccuracies are acceptable within your problem domain. You cannot do this with LLMs. They will fail in exciting and surprising ways. And those failure modes will change significantly across minor revisions.
-
-
-
-
Stephen J. Andersonreplied to Simon Willison last edited by
@simon @kitten_tech @carbontwelve @david_chisnall How would you avoid or deal with the issues that David encountered? Specifically, subtle bugs that the process of debugging make the whole process less efficient than writing it yourself. Is there one of your notes that deals with that already?
-
-
-
Glitzersachen.dereplied to David Chisnall (*Now with 50% more sarcasm!*) last edited by
@david_chisnall @zebratale @carbontwelve
"do make mistakes" I wouldn't call that a mistake. The calculator does what it should do according to the spec how to approximate real numbers with a finite number of bits.
It's (as you explain) a rounding error. A "mistake" is what Pentiums with the famous Pentium bug made.
But maybe it's my understanding of English (as a second language) that is at fault here.
-
@glitzersachen @david_chisnall @zebratale @carbontwelve the calculator /is/ doing exactly what it's been programmed to... and it is programmed to make specific and defined "mistakes" or errors in predictable and clear cut ways in order to make the pocket calculator run on as little power as possible.
An LLM, likewise, is also doing exactly what it was programmed to do... and that is to spew regurgitated nonsense it read off the internet.
-
Simon Willisonreplied to Stephen J. Anderson last edited by
@utterfiction @kitten_tech @carbontwelve @david_chisnall you have to assume that the LLM will make weird mistakes all the time, so your job is all about code review and meticulous testing
I still find that a whole lot faster then writing all the code myself
Here's just one of many examples where I missed something important: https://simonwillison.net/2023/Apr/12/code-interpreter/#something-i-missed
-
@utterfiction @kitten_tech @carbontwelve @david_chisnall but honestly, the disappointing answer is that most of this comes down to practice and building intuition for tasks the models are likely to do well vs mess up
Manipulating some elements in the HTML DOM with JavaScript? They'll nail that every time
Implementing something involving MDIO registers? My guess is there are FAR less examples relating to that in the (undocumented, unlicensed) training data so much more likely to make mistakes
-
-
Martijn Faassenreplied to Alaric Snell-Pym last edited by
@carbontwelve @david_chisnall .
Note that how @simon reports using this to generate little projects is an entirely different mode of working with them. I have used copilot for a few years now and like it myself, which is mostly context sensitive autocomplete.
A Q&A session to create code for a CLI tool or web app is a very different way of working I started exploring more recently. It's surprisingly capable for little projects and requires a different approach.
-
@faassen @kitten_tech @carbontwelve @david_chisnall Steve Yegge calls it CHOP, for Chat Oriented Programming https://simonwillison.net/2024/Jul/12/the-death-of-the-junior-developer/
-
pasta la vidareplied to Pendell last edited by [email protected]
@pendell @glitzersachen @david_chisnall @zebratale @carbontwelve floating point finance calculations is a common mistake...
-
@pendell @glitzersachen @david_chisnall @zebratale @carbontwelve programmers and CPU designers are just a tad sensitive and insecure when someone points out the calculator makes a mistake and isn't mathematically perfect