Good gods the logical leap from “we got a diffusion model to hallucinate a video of doom” to “no more video game programmers”. https://mastodon.social/@arstechnica/113040739034038174

cthos 🐱

@datarama that is kinda a video though, there are no deterministic game mechanics its interacting with, it’s just generating the next most likely frame. Maybe “interactive video”

datarama

@cthos Or, well, a really shitty game.

A couple years ago, NVidia trained a GAN to implement Pac-Man, in much the same way as this. It had some of the same weird glitchy behaviour (obviously) ... but, as one of my friends remarked, the only reason this worked at all was that *someone had already made Pac-Man*.

cthos 🐱

@datarama yuuup. Not arguing, just processing “aloud”.

datarama

@cthos Also: I can't even begin to imagine how you would use anything even resembling this to re-implement, say, Stellaris. Or Stardew Valley, for that matter.

Anthony

@[email protected] @[email protected] Among other things, "thinking every pixel in real time" is insanely inefficient. The whole point of doing, say, 3-d modeling is to compress an otherwise enormous amount of information into a comparatively small representation that's also naturally aligned with the intended future transformations of it. AI definitely does not solve the general problem of determining the best representation for arbitrary situations of this nature.

There's an asymptotic argument to be made that this cannot be done in general, in the same way that if someone gave you the position and momenta of every subatomic particle making up a rabbit you wouldn't be able to even tell that was a rabbit you were looking at, let alone determine whether it was about to eat a carrot. And even if you somehow could, it'd take you an astronomical number of orders of magnitude more work than just looking at the dang rabbit.

datarama

@abucci @cthos Although I'm doubtful it works for more than gimmicks for the immediate future (and, in keeping with GenAI, stoking fear and misery in exploited people), in a sense this would just continue along the trajectory the software industry regrettably has already established: We gleefully accept and even celebrate that insane inefficiency so we can de-skill developers a little bit more.

-sigh-

2024 tech news reliably makes me want to move into a log cabin somewhere in a forest.

cthos 🐱

@datarama @abucci how does it even progress from gimmicks? I don’t understand how the current architecture could approximate the “holodeck” outcome they’re peddling and it’s all we got.

(Complete aside: Holodeck programs in Star Trek had authors, too, they weren’t totally synthetic)

datarama

@cthos @abucci Star Trek imagined a future that treasured human effort. Silicon Valley imagines a future that devalues and destroys it.

The idle tech bro speculation I've seen: "We train it on ALL THE GAMES, and then it can interpolate new games out of its latent space!". I'm not sure this will work, but I am sure that the concept of fully synthetic games feels terribly depressing. I *like* interacting with worlds someone else made up, and seeing authorial intent and decisions shine through.

cthos 🐱

@datarama @abucci I mean, same. I want a creative vision, not regurgitated soup.

But I don’t think they’re right. It’s magical thinking that just feeding a diffusion model more and more data will eventually make it spit out new games. They keep tripping over themselves making the image models “better” but that progress seems to have already plateaued.

I’m not a data scientist though so

datarama

@cthos @abucci Me neither! But I will freely admit that I wouldn't have guessed something like a current LLM would have even worked at all, and I've grown a lot less confident about trying to predict the future.

Anthony

@[email protected] @[email protected] I'm not a data scientist, but I am a computer scientist, and there are lots of good reasons to believe that LLMs/genAI will not be able to fulfill on the promises their advocates seem to be suggesting they will.

Prompt engineering is another "basis set" for describing (a subset) of what you are already able to express with other methods. Current techniques like game engines and modeling tools and whatnot give you a certain grasp on the space of possible games; they make certain games easier to make than others. Whatever kind of genAI gobbledygook ends up being applied to game creation is simply another way of grasping the space of possible games. It too will make a possibly different set of games easier to make than others. That's a basic observation from algorithmic information theory. There's no magic here.

Since most of the genAI models I'm aware of ultimately ground out in "small world" representations, there's a strong case to be made that they will always fall well short of what human beings are capable of. By "small world", I mean they tend to have finite-width bottlenecks. For instance, according to Stephen Wolfram's account, GPT's core LLM outputs a probability distribution over approximately 50,000 tokens. While an impressive amount of English text can be built from combinations of these 50,000 units, not all of it can. Further, humans readily invent new words and patterns and give them meaning; often whenever we see a constraint (like 50,000 tokens) one of the first things we do is blast through it and create patterns outside the constraint. Arguably, this latter process is closer to what language actually IS than the token-emission processes boosters tend to fixate on. Similar comments apply to image-generating AI.

My belief is that if "game" generating AI ever becomes common, as soon as its limits are perceived people will start building tools that transcend the limits, to make games that can't be made with the generative AI. This is what matters, and this is something generative AI may never be able to do.

datarama

@abucci @cthos I'm not sure I entirely understand the point about the tokenization. You can express every English text using just 128 of them if you tokenize on individual characters and just use ASCII (though that's going to be very expensive - but it's something I'm sure some of all those data centers are going to be put to do).

Anthony

@[email protected] @[email protected] Think "word" for "token" (*). There are more than 50,000 words in English. But beyond that, no matter how many tokens it has, if the number is fixed at N, it will never suffice to capture how human beings use language. We invent new words, symbols, etc. fluently as we communicate, which GPT cannot do and may never be able to do.

(*) Really it's more like words, word fragments, and punctuation, but that doesn't matter for the point.

datarama

@abucci @cthos I know what a token is (I've implemented tons of lexers *and* lexer generators). But I think it was Andrej Karpathy who - somewhen in 2023 - pointed out that training LLMs on Unicode code point sequences *without* a tokenizer would alleviate some of their problems - but that this would also make training much more expensive.

(I completely agree that predicting the next token or character is not at all what humans do when we communicate. It's not even what parrots do.)

cthos 🐱

@datarama @abucci we’re really far out of my wheelhouse now but would the self attention technique even work on individual letters? (That’s a question, cause I’m curious and have no idea)

datarama

@cthos @abucci Yes, at least in principle (there were some proofs-of-concept around that time). I don't know if it scales as well as tokenized LLMs, and it does mean that context windows get correspondingly smaller.

Anthony

@[email protected] Training on Unicode code point sequences would not address the problem I raised. It's still a "small world", not a "large world" / open-ended possibility space. That's the key point. @[email protected]

datarama

@abucci @cthos And I agree with that - I just think it's a problem that's separate from the tokenization.