AI generation when writing software is a false economy. You are replacing writing code with code review. Code review is harder and requires you to already have an understanding of the domain which often means that you would’ve even able to write it you...
-
AI generation when writing software is a false economy. You are replacing writing code with code review. Code review is harder and requires you to already have an understanding of the domain which often means that you would’ve even able to write it yourself to begin with. If you code gen something because you don’t know how to write it yourself, you by definition cannot review it without going though an effort equivalent to writing it yourself in the first place.
Unless of course you don’t care about code review and so doom yourself into treating software like magical incantations that break randomly for no perceivable reason; but no good mage would do that, surely.
-
dataramareplied to Mary:icosahedron: on last edited by
@mary It seems to me like the entirety of generative AI is about replacing creative work with managerial work.
This, perhaps, is why it is so polarizing: This appeals to a certain kind of personality, and deeply repels another.
-
chuckadeus kummererreplied to datarama on last edited by
-
dataramareplied to chuckadeus kummerer on last edited by
@xkummerer @mary Because it's managers who have the power to make that sort of decision, obviously.
But in code, you *could* invert the relationship: Rather than having the AI write code and the human review it (before sending it off to another human for review), you could have the AI review the human's code (also before sending it off to another human for review). This lessens the reviewing-human's (managerial/supervisory) workload, and is less likely to have a destructive effect on quality.
-
@xkummerer @mary (As a footnote, managers are kinda like systems administrators in that when they do their job well, you're barely aware of the work they do, but when they do their job poorly, you're *very* aware of it, because you have to try to route around the fallout to get your own work done.
I'm very happy my manager isn't an LLM.)
-
@[email protected] @[email protected] @[email protected] Back when I was a software developer someone in my group identified a subtle bug in money processing code that in some circumstances would have resulted in a small rounding error. We pored over this awhile and convinced ourselves that we could have siphoned money out of that and, since we controlled all these systems, it could probably gone undetected for quite some time. This company had revenues in the many hundreds of millions so it would have added up to a tidy sum over enough time.
We alerted the management and it was fixed. A few things occur to me:
1. An LLM would never find a bug like that. In fact, it's fairly likely to generate them
2. Software developers who are not deeply attuned to their codebase (say because LLMs generated substantial portions of it) would be unlikely to find such bugs
3. If i were a software developer today and I were required to use LLMs in my work, I would not tell management about bugs like this if I found them. Because they've signalled to me they don't respect what I'm capable of enough to support me in it, so why should I? I'ld save the good energy for my hobbies and phone it in at work as much as I could get away with; it won't matter to the higher ups.
-
@abucci @xkummerer @mary *Replacing* human code review with an LLM would be a horrifically bad idea, and anybody who does that deserves what they're going to get. But I can see some value in an LLM "pre-reviewer" that can potentially catch some smaller, dumb mistakes before a human being looks for the more serious ones.
Critically, you'd have some human insight while writing the code *and* in the actual review, whereas LLM code synthesis gives you two passes of human code review instead.
-
Keithulhu Fhtammannreplied to datarama on last edited by
@datarama @xkummerer @mary Grammarly for coding, in other words? Except I can tell you, as someone who has both worked professionally as an editor and done a bit of purely amateur, hobbyist-level programming (back when it was called "programming" and not "coding"), that Grammarly is often wrong, and reviewing code for its functionality is a way more difficult task than proofreading text for its adherence to conventions.
-
Mary:icosahedron:replied to Keithulhu Fhtammann on last edited by
@KeithAmmann @datarama @xkummerer I don't really see what significant benefits an LLM would necessarily have over existing rule-based code linters.
-
Alexander The 1streplied to datarama on last edited by
@datarama @abucci @xkummerer @mary ...This still sounds like using an LLM to do what a linter and a compiler checker do with *way* less energy consumption.
-
dataramareplied to Keithulhu Fhtammann on last edited by
@KeithAmmann @xkummerer @mary I develop software for a living (and have also done it recreationally since I was a child).
What I'm saying is *exactly* that a not-too-irresponsible role of a tool that is going to get things wrong is more like Grammarly and less like any work an actual human does. You don't *expect* Grammarly to do the work of a competent editor, much like you shouldn't expect an LLM to do the work of a competent reviewer (or programmer).
-
Alexander The 1streplied to Alexander The 1st on last edited by
@datarama @abucci @xkummerer @mary Additionally, after a while, the human reviewer is likely to just ignore anything the LLM generated, since there's no guarantee it's even valid.
(But the part about the LLM performing linting and compiler error reporting steps reminds me of the hype around NFTs in games...where a friend of mine pointed out that even if the economics and transferring part made sense...you could do essentially the same thing with a SQL database table.)
-
Keithulhu Fhtammannreplied to datarama on last edited by
@datarama @xkummerer @mary Gotcha.
(Although a lot of people don't know the difference between editing and using Grammarly.)
-
dataramareplied to Alexander The 1st on last edited by
@AT1ST @abucci @xkummerer @mary I'm not talking about replacing a linter or compiler error report with an LLM. I'm talking about having an LLM run through some code *before sending it for review*, which I'm assuming people don't do before having compiled and linted it first anyway.
The kinds of fuckups I've seen LLMs catch that linters don't are things like poor variable naming and inconsistencies in the relationship between comments and code. ...
-
@AT1ST @abucci @xkummerer @mary Things where you *absolutely* don't want the LLM to make any modifications itself, but where you might want it to tell you that there's something you might want to stop and think about an extra time.
For this kind of thing, the small locally-hosted ones aren't much worse than the datacenter-scale ones, so you don't even have to boil an ocean to get your pre-review.
-
@AT1ST @abucci @xkummerer @mary Full disclosure: I don't like LLMs (or image diffusion models, for that matter) at all, as I hope my first comment in this thread shows. They're unreliable and resource-hungry, and I *really* don't like the politics and ethics of the people building and promoting them. I prefer to not use them if I have any choice in the matter.
I'm trying to figure out the least terrible ways to use them if we end up *not* having a choice in the matter.
-
dataramareplied to Keithulhu Fhtammann on last edited by
@KeithAmmann @xkummerer @mary I don't even use Grammarly.
-
Bas Schoutenreplied to Mary:icosahedron: on last edited by
@mary You are assuming all software is written for production purposes. Which is false.
Many scripts and snippets are written for purposes of testing, evaluation or for one off tasks whose outcome is trivially verifiable.
In these cases neither a full understanding nor a thourough review is required for them to verifiably serve their purpose.
-
dataramareplied to Mary:icosahedron: on last edited by
@mary @KeithAmmann @xkummerer A linter isn't going to notice if I've just introduced an inconsistency between some code and a comment, or if I've come up with bad names for my variables / function names /etc. Conversely, an LLM isn't going to notice very much of the stuff that takes serious static analysis to find (so replacing linters with LLMs would be a bad idea).
And none of them reliably catch subtle logic errors or discrepancies between code and domain.
-
Jan :rust: :ferris:replied to Bas Schouten on last edited by
@Schouten_B @mary ...until later on:
- that "prototype" ends up in production and needs to be maintained for 2 years
- the script for that one off task looks like it does the right thing during testing, but a corner case hasn't been considered that leads to errors in production and now adjustment of the script is needed (in other words: _maintenance_)