About as open source as a binary blob without the training data

[email protected]

Because over-hyped nonsense is what the stock market craves... That's how this works. That's how all of this works.

[email protected]

Would you accept a Smalltalk image as Open Source?

[email protected]

Ok. How does that apply to DeepSeek?

Your anti-AI talking points are so embedded with anti-Big Tech arguments, that now you can’t pivot when it’s a publicly available, communist developed, energy efficient AI.

[email protected]

That... Doesn't align with years of research. Data is king. As someone who specifically studies long tail distributions and few-shot learning (before succumbing to long COVID, sorry if my response is a bit scattered), throwing more data at a problem always improves it more than the method. And the method can be simplified only with more data. Outside of some neat tricks that modern deep learning has decided is hogwash and "classical" at least, but most of those don't scale enough for what is being looked at.

Also, datasets inherently impose bias upon networks, and it's easier to create adversarial examples that fool two networks trained on the same data than the same network twice freshly trained on different data.

Sharing metadata and acquisition methods is important and should be the gold standard. Sharing network methods is also important, but that's kind of the silver standard just because most modern state of the art models differ so minutely from each other in performance nowadays.

Open source as a term should require both. This was the standard in the academic community before tech bros started running their mouths, and should be the standard once they leave us alone.

[email protected]

... Statistical engines are older than personal computers, with the first statistical package developed in 1957. And AI professionals would have called them trained models. The interpreter is code, the weights are not. We have had terms for these things for ages.

[email protected]

Or as a human without all the previous people's work we learned from without paying them, aka normal life.

[email protected]

Actually no. As someone who prefers academic work, I very heavily prefer Deepseek to OpenAI. But neither are open. They have open weights and open source interpreters, but datasets need to be documented. If it's not reproducible, it's not open source. At least in my eyes. And without training data, or details on how to collect it, it isn't reproducible.

You're right. I don't like big tech. I want to do research without being accused of trying to destroy the world again.

And how is Deepseek over-hyped? It's an LLM. LLM's cannot reason, but they're very good at producing statistically likely language generation which can sound like its training data enough to gaslight, but not actually develop. They're great tools, but the application is wrong. Multi domain systems that use expert systems with LLM front ends to provide easy to interpret results is a much better way to do things, and Deepseek may help people creating expert systems (whether AI or not) make better front ends. This is in fact huge. But it's not the silver bullet tech bros and popsci mags think it is.

[email protected]

But also, you were talking about Nvidia in your comment I responded to, not Deepseek, so your rebuttal is non sequitur...

[email protected]

Yes please, let's use this term, and reserve Open Source for it's existing definition in the academic ML setting of weights, methods, and training data. These models don't readily fit into existing terminology for structure and logistic reasons, but when someone says "it's got open weights" I know exactly what set of licenses and implications it may have without further explanation.

[email protected]

LoL. Love when bots can’t follow the conversation, and accidentally out themselves.

[email protected]

Weights available?

[email protected]

China's new and cheaper magic beans shock America's unprepared magic bean salesmen

American magic bean companies like Beanco, The Boston Bean Company, and Nvidia have already shed hundreds of billions of dollars in stock value.

The Beaverton (www.thebeaverton.com)

[email protected]

I like how when America does it we call it AI, and when China does it it's just an LLM!

[email protected]

Hm. I speak like a bot, do I? Maybe I am autistic after all.

I am aware, my boyfriend and I have already had this conversation, but I guess he's not on Lemmy, so you can't ask him.

Yes, DeepSeek caused a drop in the stock price, but you were saying that believing that LLM's are over-hyped would lead to having insider knowledge and could give us an advantage in the stock market. Particularly with their already tanked stock. However, the stock market fluctuates based on hype, not value, and will do whatever the fuck it pleases, so the only way to have insider knowledge is by being on a board who controls the price or by managing to dump hype into the system. That is not something a lot of people have the power to do individually.

But since you think I'm a bot and I have no way to disprove that thanks to what the world is now, I bid you adieu. I hope you're having a good one. And stop antagonizing people for talking differently, please.

[email protected]

Yeah, let's all base our decisions and definitions on what the stock market dictates. What could possibly go wrong?

/s

[email protected]

communist developed, energy efficient AI.

lol

[email protected]

When your narcissism has reached the point of, ‘I know better than every hedge fund manager, and technical expert on the subject’, it’s time to get an evaluated for a personality disorder.

[email protected]

What’s inaccurate about it?

[email protected]

I'm including Facebook's LLM in my critique. And I dislike the current hype on LLMs, no matter where they're developed.

[email protected]

Ok, then my definition givenwas too narrow, when I said "reproducable binaries". If data claims to be "open source", then it needs to supply information on how to reproduce it.

Open data has other criteria, I'm sure.