AI generation when writing software is a false economy. You are replacing writing code with code review. Code review is harder and requires you to already have an understanding of the domain which often means that you would’ve even able to write it you...

datarama

@KeithAmmann @xkummerer @mary I develop software for a living (and have also done it recreationally since I was a child).

What I'm saying is *exactly* that a not-too-irresponsible role of a tool that is going to get things wrong is more like Grammarly and less like any work an actual human does. You don't *expect* Grammarly to do the work of a competent editor, much like you shouldn't expect an LLM to do the work of a competent reviewer (or programmer).

Alexander The 1st

@datarama @abucci @xkummerer @mary Additionally, after a while, the human reviewer is likely to just ignore anything the LLM generated, since there's no guarantee it's even valid.

(But the part about the LLM performing linting and compiler error reporting steps reminds me of the hype around NFTs in games...where a friend of mine pointed out that even if the economics and transferring part made sense...you could do essentially the same thing with a SQL database table.)

Keithulhu Fhtammann

@datarama @xkummerer @mary Gotcha.

(Although a lot of people don't know the difference between editing and using Grammarly.)

datarama

@AT1ST @abucci @xkummerer @mary I'm not talking about replacing a linter or compiler error report with an LLM. I'm talking about having an LLM run through some code *before sending it for review*, which I'm assuming people don't do before having compiled and linted it first anyway.

The kinds of fuckups I've seen LLMs catch that linters don't are things like poor variable naming and inconsistencies in the relationship between comments and code. ...

datarama

@AT1ST @abucci @xkummerer @mary Things where you *absolutely* don't want the LLM to make any modifications itself, but where you might want it to tell you that there's something you might want to stop and think about an extra time.

For this kind of thing, the small locally-hosted ones aren't much worse than the datacenter-scale ones, so you don't even have to boil an ocean to get your pre-review.

datarama

@AT1ST @abucci @xkummerer @mary Full disclosure: I don't like LLMs (or image diffusion models, for that matter) at all, as I hope my first comment in this thread shows. They're unreliable and resource-hungry, and I *really* don't like the politics and ethics of the people building and promoting them. I prefer to not use them if I have any choice in the matter.

I'm trying to figure out the least terrible ways to use them if we end up *not* having a choice in the matter.

datarama

@KeithAmmann @xkummerer @mary I don't even use Grammarly.

Bas Schouten

@mary You are assuming all software is written for production purposes. Which is false.

Many scripts and snippets are written for purposes of testing, evaluation or for one off tasks whose outcome is trivially verifiable.

In these cases neither a full understanding nor a thourough review is required for them to verifiably serve their purpose.

datarama

@mary @KeithAmmann @xkummerer A linter isn't going to notice if I've just introduced an inconsistency between some code and a comment, or if I've come up with bad names for my variables / function names /etc. Conversely, an LLM isn't going to notice very much of the stuff that takes serious static analysis to find (so replacing linters with LLMs would be a bad idea).

And none of them reliably catch subtle logic errors or discrepancies between code and domain.

Jan :rust: :ferris:

@Schouten_B @mary ...until later on:

- that "prototype" ends up in production and needs to be maintained for 2 years
- the script for that one off task looks like it does the right thing during testing, but a corner case hasn't been considered that leads to errors in production and now adjustment of the script is needed (in other words: _maintenance_)

datarama

@mary @KeithAmmann @xkummerer To see the difference, try feeding some awful Bash to Shellcheck on one hand, and to an LLM on the other.

The LLM isn't (usually) going to find even half of the *really* dangerous crap that Shellcheck does. But it can (sometimes) tell you if you've screwed up your documentation, or written code that's likely to be harder for a human to understand than it has to be. They're not *reliable* in the same way a linter is, and probably can't ever be.

datarama

@janriemer @Schouten_B @mary

I've said elsewhere that I think a lot of this particular discussion is actually two discussions hiding inside the contours of each other. One is "are LLMs useful for programming?", and the other is "will LLMs replace programmers?". I fall into this trap myself, probably because this is a very bad time to be a programmer who's dealing with anxiety.

At any rate: I make lots of little scripts that are literally one-off, as in I don't bother keeping them after use.

datarama

@janriemer @Schouten_B @mary (Quick data conversions and migrations of various kinds, mostly - where there's nothing left for the script to do after I've run it.

But yes, "load-bearing prototypes" are definitely something that unfortunately regularly happens, and something our profession ought to be a lot more mindful about.)

? Offline

@datarama @xkummerer @mary

The few times I asked it to review code it did so very poorly, suggested changes that made no sense and broke the code.

It's useless. I still don't get how people talk about it seriously after trying it for ten minutes.

datarama

@bloodykneelers @xkummerer @mary I've gotten both good and bad results out of several of them, and I think the most frustrating aspect is that it is hard to develop an intuition of what they're good at and what they're bad at.

Pangolin Gerasim

@datarama @mary @KeithAmmann @xkummerer so, at best, marginal utility from adding an LLM into the dev workflow?

You've now got two things to review: the correctness of the code, and the assertions made about that code by a code-reviewing LLM.

Add to that the horrendous costs of running an LLM (and we don't yet reliably know what that'll be when the VCs stop pumping in vast amounts of cash to pay for all that compute) and we're well into negative territory.

datarama

@fluidlogic @mary @KeithAmmann @xkummerer I'd say that having an LLM pre-reviewer is far *less* likely to land you in negative territory than LLM code generation is. Assuming you've just written the code, it's still fresh in your brain, and you can quickly see if its suggestions actually make a valid point.

It's also the sort of thing you can do using a smaller, locally-hosted LLM, so the compute cost at inference time is "two seconds of your GPU's time, or five of your CPU's."

vriesk (Jan Srz)

@datarama @abucci @xkummerer @mary Yeah, if done well, I wouldn't mind this kind of AI-driven linter (_in addition_ to static linter rules).