About as open source as a binary blob without the training data

[email protected]

There were e|forts. Facebook didn't like those.

billwashere

That sounds like a segment on “My 600lb Life”

[email protected]

I reckon C++ > Delphi

[email protected]

You can do sneaky things with weights that are virtually undetectable.

magic_lobster_party

Ok I understand now why people are upset. There’s a disagreement with terminology.

The source code for the model is open source. It’s defined in PyTorch. The source code for it is available with the MIT license. Anyone can download it and do whatever they want with it.

The weights for the model are open, but it’s not open source, as it’s not source code (or an executable binary for that matter). No one is arguing that the model weights are open source, but there seem to be an argument against that the model is open source.

And even if they provided the source code for the training script (and all its data), it’s unlikely anyone would reproduce the same model weights due to randomness involved. Training model weights is not like compiling an executable, because you’ll get different results every time.

[email protected]

I may misunderstand, but are the weights typically several hundred gigabytes large?

[email protected]

Even worse is calling a proprietary, absolutely closed source, closed data and closed weight company "OpeanAI"

[email protected]

Is it even really software, or just a datablob with a dedicated interpreter?

[email protected]

That's fat shaming

[email protected]

For neural nets the method matters more. Data would be useful, but at the amount these things get trained on the specific data matters little.

They can be trained on anything, and a diverse enough data set would end up making it function more or less the same as a different but equally diverse set. Assuming publicly available data is in the set, there would also be overlap.

The training data is also by necessity going to be orders of magnitude larger than the model itself. Sharing becomes impractical at a certain point before you even factor in other issues.

[email protected]

I mean that's all a model is so....
Once again someone who doesn't understand anything about training or models is posting borderline misinformation about ai.

Shocker

[email protected]

Hell, for all we know it could be full of classified data. I guess depending on what country you're in it definitely is full of classified data...

[email protected]

Yet another so-called AI evangelist accusing others of not understanding computer science if they don't want to worship their machine god.

[email protected]

You don't download the training data when running locally. You are downloading the sheet baked model.

[email protected]

Praise the Omnisiah! ... I'll see myself out.

[email protected]

While I completely agree with 90% of your comment, that first sentence is gross hyperbole. I have used a number of pieces of open source options that are are clearly better. 7zip is a perfect example. For over a decade it was vastly superior to anything else, open or closed. Even now it may be showing its age a bit, but it is still one of the best options.
But for the rest of your statement, I completely agree. And yes, CAD is a perfect example of the problems faced by open source. I made the mistake of thinking that I should start learning CAD with open source and then I wouldn't have to worry about getting locked into any of the closed source solutions. But Freecad is such a mess. I admit it has gotten drastically better over the last few years, but it still has serious issues. Don't get me wrong, I still 100% recommend that people learn it, but I push them towards a number of closed source options to start with. Freecad is for advanced users only.

[email protected]

Do you think your comments here are implying an understanding of the tech?

[email protected]

Judging by OP’s salt in the comments, I’m guessing they might be an Nvidia investor. My condolences.

[email protected]

Especially after it was founded as a nonprofit with the mission to push open source AI as far and wide as possible to ensure a multipolar AI ecosystem, in turn ensuring AI keeping other AI in check so that AI would be respectful and prosocial.

[email protected]

A model can be represented only by its weights in the same way that a codebase can be represented only by its binary.

Training data is a closer analogue of source code than weights.