These studies are important but nevertheless they don't tell the whole history because the outcome will always depend on the model that is used.

Miguel Afonso Caetano

These studies are important but nevertheless they don't tell the whole story because the outcome will always depend on the model that is used. If you only test one model, of course the result cannot be generalized to all software development tasks. Ultimately, I'm afraid these studies will only contribute to increase the public's perception against AI:

#AI #GenerativeAI #CoPilot #LLMs #SoftwareDevelopment #Programming: "Many developers say AI coding assistants make them more productive, but a recent study set forth to measure their output and found no significant gains. Use of GitHub Copilot also introduced 41% more bugs, according to the study from Uplevel, a company providing insights from coding and collaboration data.

The study measured pull request (PR) cycle time, or the time to merge code into a repository, and PR throughput, the number of pull requests merged. It found no significant improvements for developers using Copilot.

Uplevel, using data generated by its customers, compared the output of about 800 developers using GitHub Copilot over a three-month period to their output in a three-month period before adoption.
(...)
There’s a difference between writing a few lines of code and full-fledged software development, Gekht adds. Coding is like writing a sentence, while development is like writing a novel, he suggests.

“Software development is 90% brain function — understanding the requirements, designing the system, and considering limitations and restrictions,” he adds. “Converting all this knowledge and understanding into actual code is a simpler part of the job.”"

https://www.cio.com/article/3540579/devs-gaining-little-if-anything-from-ai-coding-assistants.html

Yumechi | ゆめち | :ferris: :haskell: Wheel Inventor

@[email protected] I think one reason may be they only surveyed commercial software (?) and commercial development has much more "just because" requirements which AI cannot comprehend. For open source (big 10k+ star ones or personal use ones) and academic work Copilot has definitely increased my code efficiency and quality: not because copilot can do my job, but I can worry less about mechanical work with non-trivial refactoring and glueing algorithms and APIs and pipelines that I already wrote together automatically. I have less psychological burden making core changes and I feel I am more likely to do a better job under this.

Also statistically PR time/throughout by time does not feel like a good metric: most PR time is waiting for review based on priority not actually working on it, maybe no of revisions/rounds of reviews might be better? The "41% more bug" by an unreleased methodology also feels like either the developers were overusing it or just statistic issues...

It is true that it rots your memory sometimes, so I occasionally turn it off just to train my brain on the muscle memory on trivial things.

Miguel Afonso Caetano

@yume Yes, I tend to agree with you. I'm not an AI zealot but taking into consideration my personal projects, so far my experience with LLMs has been very good. Ideally, you should always use at least two models for the same task/project because you cannot trust just one.