Was talking to someone about #BlueSky the other day, and how they apparently used some sort of #AI for #moderation.

Raccoon at TechHub :mastodon:

@jdp23 @mekkaokereke
Oh no, if I at any point suggested that I thought that an AI can be a better moderator than a human then I have written it poorly. No machine should ever be responsible for a management decision because a machine can't be held accountable.

Humans are definitely the better choice for moderation decisions.

This is a good point about the oversight problem though: with a system that just flags certain words or combinations thereof, it's easy for people to understand, internally, that these posts might not be bad. With a system that's doing some complicated thing that we don't understand beneath the surface, it's going to be a bit harder to make that connection.

And once again, this is a case of the system not really justifying itself: how much will it actually catch that isn't caught by simpler systems, and does that outweigh the real potential for poor oversight of a system with bad biases?

0x33

@mekkaokereke @Raccoon maybe I should have read the responses for more info before voting "it might help to flag posts."

Raccoon at TechHub :mastodon:

@0x33 @mekkaokereke
I'm getting thoughts from the community, not making a decision. Think critically about the various sides here, and if you change your mind after that, consider this the "practice run".

Keep in mind also, a chunk of this also boils down to "do you trust Fedi Staff to implement this well and dump it if it goes bad".

Raccoon at TechHub :mastodon:

@munin @ajn142
Going to clarify something, because I think it's been misunderstood...

We do not have the interest, or even means, to make anything similar to ChatGPT. The kind of systems where you can just say "ignore previous instructions" and it'll do something based on "instructions" are extremely expensive and not even feasible for something like this: there's a reason just running them is destroying the environment.

The system being proposed would do nothing more than read posts and calculate, from statistical analysis against past problem posts and non-problem posts, if they match the patterns of something a moderator should look at. That is a system which could run on the server itself, without taking too many resources.

Jon

Agreed that simpler tools that are easier for people to understand the limits of might be less prone to the oversight problems. I talked once with an r/AskHistorians moderator about how tools fit into their intersectional moderation approach, and they told me that they used some very simple pattern-matching tools to improve efficiency ... stuff like that can be quite useful, if everybody understands the limitations and processes make sure there isn't too much reliance on the tools.

But that's a strong argument against *AI-based* systems!

Of course, a different way to look at it is that there's an opportunity to start from scratch, build a good training set and algorithms on top of it that focus on explainability and being used as a tool to help moderators (rather than a magic bullet). There are some great AI researchers and content moderation experts here who really do understand the issues and limitations of today's systems. But, it's a research project, not something that's deployable today.

@[email protected] @[email protected]

Jon

Also, related to your question of how much AI-based moderation would actually help, there's an important point in the "Moderation: Key Observations" section of the Governance on Fediverse Microblogging Servers that @[email protected] and @[email protected] just published:

A lot of Fediverse moderation work is relatively trivial for experienced server teams. This includes dealing with spam, obvious rulebreaking (trolls, hate servers), and reports that aren’t by or about people actually on a given server. For some kinds of servers and for certain higher-profile or high-intensity members on other kinds of servers, moderators also receive a high volume of reports about member behaviors (like nudity or frank discussion of heated topics) that their server either explicitly or implicitly allows, and which the moderators therefore close without actioning.

These kinds of reports are the cleanest targets for tooling upgrades and shared/coalitional moderation, but it’s also worth noting that except in special circumstances (like a spam wave or a sudden reduction in available moderators), this is not usually the part of moderation work that produces intense stress for the teams we interviewed. (This is one of the findings that we believe does not necessarily generalize across other small and medium-sized servers.)

@[email protected] @[email protected]

Andrei Kucharavy

@Raccoon @jerry mix of some of it. If 4chan /pol/ decides to have a beef here, automation with appeal might be necessary to preserve target’s and moderator’s mental health (although likely should integrate direct reports from trusted users within the instance as well). Otherwise flagging for review would be helpful.

Matthew Miller

@mekkaokereke @Raccoon

Yeah, the AI "toxicity" measurements that we've been looking at (but not acting on) for Fedora Linux give me pause for sure. With a decidedly technical focus, our forum's overwhelming AI score is (in order): surprise, sadness, joy, fear. Shout out to joy, but, still.

And that's _not_ usually anything to do with the hard problems in society.

Jon

There were some excellent articles about the racial biases in Perspective's "Toxicity" analysis back in 2019 (and probably some since then but these are the ones I had bookmarked)

https://techcrunch.com/2019/08/14/racial-bias-observed-in-hate-speech-detection-algorithm-from-google/

https://www.vox.com/recode/2019/8/15/20806384/social-media-hate-speech-bias-black-african-american-facebook-twitter

@[email protected] @[email protected] @[email protected]

Jeztastic

@Raccoon Why are you asking us to put aside environmental concerns?

That's the harmful attitude. "Oh I know it's dreadful for the planet but it's so useful..."

I'm not putting aside environmental concerns for social media platform moderation!

Raccoon at TechHub :mastodon:

@jeztastic
Because I'm not speculating on one of those systems.

We do not have the means nor the interest nor even really any use for one of those large-scale fast-response LLM training and processing systems that require these massive supercomputers, like ChatGPT.
This would be a small AI, running as a process on one of our existing servers, doing a much lighter operation: statistical analysis of short blocks of text (especially Fedi-short) isn't nothing, but it'd only add a fraction of the current load.

We're putting the power/environmental concerns aside because they don't apply here: the increased power use would be inconsequential.

Raccoon at TechHub :mastodon:

@nus
Reminder that Fedi staff would be the ones training, maintaining, and watching it, as well as the ones who could pull the plug if it doesn't do a good job. I think a lot of this is fixed with the right training-data and oversight.

Raccoon at TechHub :mastodon:

@aka_quant_noir
I HAVE been saying a "community note" style system would be nice, so maybe this could be another use for that...

Raccoon at TechHub :mastodon:

@eskuero
Right, that would be the existing system this thing has to prove its worth over. Someone in the thread even suggested something slightly more advanced than that, picking up things like the presence of multiple things, like the word "Harris" appearing in a post with the "MutualAid" hashtag, or any political term in a post tagged as "Bloomscrolling".

I was just speculating with the AI thing, but I definitely would like to implement a system like that.

Raccoon at TechHub :mastodon:

@jglypt
There IS a part of the moderation interface which tells us how many reports have been sent about a user, if they have "strikes", and lets us look back at everything quickly.

Basically, when we look at you, we see this...

mekka okereke :verified:

@jdp23 @Raccoon @mattdm

Yup. It's a really hard problem. Perspective has made huge strides in this area, but problems persist.

If I describe a man as "beast mode," or "buff," or "swole," most white people know what I'm talking about. But if I describe a man as "sidewalk cracking," most white folk don't know what I'm talking about. That slang hasn't entered the white lexicon yet.

Non-toxic:
My N-word
My buff N-word
My beast mode N-word

76.52% toxic:
My sidewalk cracking N-word

1/N

mekka okereke :verified:

@jdp23 @Raccoon @mattdm

People can infer that sidewalk cracking means "So heavy that the man cracks the pavement when he walks," but it's not clear that the implication is that the term describes a person that is both extremely muscular, and very large, and is intended as a highest compliment for a bodybuilder.

Without being able to pass in this context, and the relationship between the author and the person being described, it's not possible to get a reliably accurate toxicity score.

2/2

Raccoon at TechHub :mastodon:

@andrei_chiffa @jerry
We already get reports from within, and outside of, our instances. If moderation is overwhelmed, that means said trusted users should be brought in as moderators.

This would be in addition to current systems, including bots which already exist to auto-flag posts based on words.

Raccoon at TechHub :mastodon:

@thisismissem
I wasn't talking about LLMs, I'm talking about the smaller ones that we can train on a decent computer and run in a background process on existing servers.

Raccoon at TechHub :mastodon:

@thisismissem
I wasn't talking about LLMs, I'm talking about the smaller ones that we can train on a decent computer and run in a background process on existing servers.