What the hell is "superalignment"?

Jeff Jarvis

What the hell is "superalignment"? When the "safety" people buy the BS of AGI, it's hard to know whether to take them seriously. We need safety without the superlatives, grounded in present-tense reality, not macho computer dreams and nightmares.

OpenAI putting ‘shiny products’ above safety, says departing researcher
https://www.theguardian.com/technology/article/2024/may/18/openai-putting-shiny-products-above-safety-says-departing-researcher

Ulrike Hahn

@jeffjarvis @CatherineFlick

I don‘t understand this critique. If you are trying to build something that is intended to transformatively go beyond what we have ever had before, *shouldn‘t* you think about the safety of that *before* you put it into the world?

https://openai.com/index/superalignment-fast-grants/

Prof. Catherine Flick

@UlrikeHahn @jeffjarvis well, yes, but with this sort of thing you can’t just do it all beforehand - you need to be monitoring as well. Honestly I don’t think they do either particularly well but this mindset doesn’t help!

Ulrike Hahn

@CatherineFlick @jeffjarvis Catherine, what is the evidence that they believe they can do it all beforehand? and, of course you need to be monitoring, but I take the superalignment brief to be specifically about something we patently don‘t have yet: superhuman intelligence. So its impact isn‘t something that could currently be monitored.

Prof. Catherine Flick

@UlrikeHahn @jeffjarvis well I’m perhaps giving them a bit more of a benefit of the doubt here than I should that they might be thinking at all about this, but I agree, it’s def not something they can monitor right now!

Ulrike Hahn

@CatherineFlick @jeffjarvis „were thinking about“…sounds like the team has been disbanded….

Jeff Jarvis

@UlrikeHahn @CatherineFlick
The critique is (1) AGI is bullshit made of their macho technological ego and (2) what they call "safety" is not present-test risks and issues (such as those explored in the Stochastic Parrots paper) but instead their doomerist bullshit ("we are so powerful we can destroy the world"), which has its roots in their faux philosophies of TESCREAL, which in turn has its roots in eugenics.See: https://firstmonday.org/ojs/index.php/fm/article/view/13636/11599

Ulrike Hahn

@jeffjarvis @CatherineFlick it‘s specifically a unit that was tasked with thinking about *future risk* so it seems odd to criticise it for not investigating current risk.

as you mention the stochastic parrots paper, you might (or might not!) find this interesting: https://write.as/ulrikehahn/stochastic-parrot-is-a-misleading-metaphor-for-llms

Prof. Catherine Flick

@UlrikeHahn @jeffjarvis future risk is tied to current risk though. If you don’t monitor how a tool like this is evolving and impacting society, and regularly reassessing the future risk based on that (and societal response to that impact) then you aren’t doing a good job!

Ulrike Hahn

@CatherineFlick @jeffjarvis but nobody is denying that we should be monitoring current risk, we all agree on that! It’s simply not what the superalignment unit (or the linked Guardian article) was about

literally nothing follows from the presence or absence of that unit with respect to the monitoring of current risk, and it would seem to me like an out and out fallacy (‚false dilemma‘) to assume that it did.

Ulrike Hahn

@CatherineFlick @jeffjarvis but I‘m now also bewildered by what „TESCREAL“ and its putative roots in eugenics is doing here. By what causal model of the world would a set of values determine whether AGI is empirically possible or not?

where is TESCREAL in the way genAI systems work or what they can do?

Prof. Catherine Flick

@UlrikeHahn @jeffjarvis it’s their underlying philosophy driving the development of the tech. You can’t separate it! Timnit Gebru writes on this a lot.

Ulrike Hahn

@CatherineFlick @jeffjarvis

"their philosophy" -who is "they"?

the technical development of LLMs/gen AI has been (and is being) driven by tens of thousands of researchers world wide.

but setting that aside, appealing to TESCREAL to answer the question of whether genAI can become superintelligent feels like trying to answer the question of whether a VW Beetle can go faster than 120km by pointing out that the VW Beetle was a project pushed by Hitler.

it feels like a category error to me

Ulrike Hahn

@CatherineFlick @jeffjarvis

there are many questions for which knowing about TESCREAL might be informative (should I trust these people, do I want this product?), but for answering questions about "what are the capabilities of this system?" or "what is the future potential?" it doesn't seem causally relevant at all.

Prof. Catherine Flick

@UlrikeHahn @jeffjarvis I don’t mean to brush off this conversation but I’m caring for small children today and just don’t have time to get into it. The whole mess is being driven by billionaires that want this thing, it underpins the whole sector. There are lots more people who write more eloquently about this than I do while trying to juggle naps and lunch for a toddler.

Ulrike Hahn

@CatherineFlick @jeffjarvis

no worries, Catherine! I'm just going to add this for another day (or anyone else that has wandered into this thread).

It seems uncontroversial to me to believe that the performance of computer programme (what outputs it produces for which inputs) is determined wholly by the programme itself, not by any feelings, intentions or beliefs I have before, during or after its execution.

1/2

Ulrike Hahn

@CatherineFlick @jeffjarvis

2/2

of course, someone wrote that programme, but once there, it's the programme itself that determines the outcomes of its computations.

To think that 'TESCREAL' tells us something about what gen AI models can and cannot do requires one to believe that an ethical question about utilitarianism such as 'how much should we weight future lives relative to current lives, aka 'longtermism') somehow has a counterpart in the actual code of LLMs/genAI systems.