really the thing for me about these huge ML models is that they don't break down into usable parts at all, so if there's something wrong with the model you just...

No Fellow Historians

really the thing for me about these huge ML models is that they don't break down into usable parts at all, so if there's something wrong with the model you just... throw it away and train a new one

or i guess i should say, there's a hard limit to how much they break down. some architectures are modular at very large scales. but in a normal computer program you can break down the execution flow into distinct parts and test them, so you can make at least some guarantees about what the program will and won't do, and you can explain why it does that (including, usually, what chain of humans made the decision to make it do that)

Asta [AMP]

@[email protected] more fucking companies should have been working on breaking them down into re-usable parts! And yet... those companies aren't the ones selling computing time, so why bother? If people have to pay you to retrain, well......

No Fellow Historians

@aud i would be extremely surprised if the technique is amenable to that, at a pretty fundamental level, tbh

Asta [AMP]

@[email protected] For... I think in general, yes (as in, could one construct an arbitrary ML model that could have re-usable parts?), there's work that can be (and has been) done. But I also think it's something you'd have to specifically focus on during model construction and training; I suspect it's pretty unlikely you'd get it "for free".

But for the large models, specifically LLMs... yeah, that's a whole ass barrel of fish, isn't it. I do wish they had focused on seeing if it was possible, but then again, if they were the type of people to care, we wouldn't be inundated with these models anyway...