Was talking to someone about #BlueSky the other day, and how they apparently used some sort of #AI for #moderation.

Raccoon at TechHub :mastodon:

@mekkaokereke
Thanks for that long response, you really brought up a good concern that I wasn't thinking about, but I've read about from past work in the field.

Obviously, any implementation we have needs to keep this in mind, and I will note, current systems which flag slurs already end up flagging posts by black and queer people using the N word and the F word respectively, which is not the kind of thing we are looking to catch with this. I'm well aware of the issue of these AI systems seeing different styles of communication and deciding differently based on that.

That said, this feels like a stronger argument against letting it run unsupervised rather than using it as a flagging system in general: if we all know it's an automated system, that it's fallible, and that it's point is to make sure we see things to look over them, and not to tell us they're bad, in theory we should be providing fair moderation.

(Continued)

Raccoon at TechHub :mastodon:

@mekkaokereke
Something that made me really curious about what you said here though is that you implied that you think there are some tools that work better than others for this? Obviously the one you quoted was an example of a bad one, but could you give me any examples or information about ones you think went down the right route?

Right now this is very much speculation, because again, I don't plan to implement this anytime soon, but it would be useful to know how we might avoid these pitfalls in the speculation phase in case someone actually decides to do it.

Raccoon at TechHub :mastodon:

@artemesia
Oooh, interesting, this could be that "lower level" we keep saying would be useful: having a bot watching specific people more closely for posts that cross the line would not only be helpful, but would mean that we as moderators don't have to look back at them. We could even set it to automatically unflag them after a certain period of time...

Definitely a good approach to be considering!

Jon

In practice, requiring human oversight of automated decision making doesn't correct for bias or errors -- people tend to defer to the automated system. Ben Green's excellent paper on this focuses on government use of automated systems, but the dynamic applies more generally. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3921216

First, evidence suggests that people are unable to perform the desired oversight functions. Second, as a result of the first flaw, human oversight policies legitimize government uses of faulty and controversial algorithms without addressing the fundamental issues with these tools.

And sure, as you point out, mistakes are made today by human moderators ... but those mistakes contaminate any training set. And algorithms typically magnify biases in the underlying data.

@[email protected] @[email protected]

Raccoon at TechHub :mastodon:

@jdp23 @mekkaokereke
Oh no, if I at any point suggested that I thought that an AI can be a better moderator than a human then I have written it poorly. No machine should ever be responsible for a management decision because a machine can't be held accountable.

Humans are definitely the better choice for moderation decisions.

This is a good point about the oversight problem though: with a system that just flags certain words or combinations thereof, it's easy for people to understand, internally, that these posts might not be bad. With a system that's doing some complicated thing that we don't understand beneath the surface, it's going to be a bit harder to make that connection.

And once again, this is a case of the system not really justifying itself: how much will it actually catch that isn't caught by simpler systems, and does that outweigh the real potential for poor oversight of a system with bad biases?

0x33

@mekkaokereke @Raccoon maybe I should have read the responses for more info before voting "it might help to flag posts."

Raccoon at TechHub :mastodon:

@0x33 @mekkaokereke
I'm getting thoughts from the community, not making a decision. Think critically about the various sides here, and if you change your mind after that, consider this the "practice run".

Keep in mind also, a chunk of this also boils down to "do you trust Fedi Staff to implement this well and dump it if it goes bad".

Raccoon at TechHub :mastodon:

@munin @ajn142
Going to clarify something, because I think it's been misunderstood...

We do not have the interest, or even means, to make anything similar to ChatGPT. The kind of systems where you can just say "ignore previous instructions" and it'll do something based on "instructions" are extremely expensive and not even feasible for something like this: there's a reason just running them is destroying the environment.

The system being proposed would do nothing more than read posts and calculate, from statistical analysis against past problem posts and non-problem posts, if they match the patterns of something a moderator should look at. That is a system which could run on the server itself, without taking too many resources.

Jon

Agreed that simpler tools that are easier for people to understand the limits of might be less prone to the oversight problems. I talked once with an r/AskHistorians moderator about how tools fit into their intersectional moderation approach, and they told me that they used some very simple pattern-matching tools to improve efficiency ... stuff like that can be quite useful, if everybody understands the limitations and processes make sure there isn't too much reliance on the tools.

But that's a strong argument against *AI-based* systems!

Of course, a different way to look at it is that there's an opportunity to start from scratch, build a good training set and algorithms on top of it that focus on explainability and being used as a tool to help moderators (rather than a magic bullet). There are some great AI researchers and content moderation experts here who really do understand the issues and limitations of today's systems. But, it's a research project, not something that's deployable today.

@[email protected] @[email protected]

Jon

Also, related to your question of how much AI-based moderation would actually help, there's an important point in the "Moderation: Key Observations" section of the Governance on Fediverse Microblogging Servers that @[email protected] and @[email protected] just published:

A lot of Fediverse moderation work is relatively trivial for experienced server teams. This includes dealing with spam, obvious rulebreaking (trolls, hate servers), and reports that aren’t by or about people actually on a given server. For some kinds of servers and for certain higher-profile or high-intensity members on other kinds of servers, moderators also receive a high volume of reports about member behaviors (like nudity or frank discussion of heated topics) that their server either explicitly or implicitly allows, and which the moderators therefore close without actioning.

These kinds of reports are the cleanest targets for tooling upgrades and shared/coalitional moderation, but it’s also worth noting that except in special circumstances (like a spam wave or a sudden reduction in available moderators), this is not usually the part of moderation work that produces intense stress for the teams we interviewed. (This is one of the findings that we believe does not necessarily generalize across other small and medium-sized servers.)

@[email protected] @[email protected]

Andrei Kucharavy

@Raccoon @jerry mix of some of it. If 4chan /pol/ decides to have a beef here, automation with appeal might be necessary to preserve target’s and moderator’s mental health (although likely should integrate direct reports from trusted users within the instance as well). Otherwise flagging for review would be helpful.

Matthew Miller

@mekkaokereke @Raccoon

Yeah, the AI "toxicity" measurements that we've been looking at (but not acting on) for Fedora Linux give me pause for sure. With a decidedly technical focus, our forum's overwhelming AI score is (in order): surprise, sadness, joy, fear. Shout out to joy, but, still.

And that's _not_ usually anything to do with the hard problems in society.

Jon

There were some excellent articles about the racial biases in Perspective's "Toxicity" analysis back in 2019 (and probably some since then but these are the ones I had bookmarked)

https://techcrunch.com/2019/08/14/racial-bias-observed-in-hate-speech-detection-algorithm-from-google/

https://www.vox.com/recode/2019/8/15/20806384/social-media-hate-speech-bias-black-african-american-facebook-twitter

@[email protected] @[email protected] @[email protected]

Jeztastic

@Raccoon Why are you asking us to put aside environmental concerns?

That's the harmful attitude. "Oh I know it's dreadful for the planet but it's so useful..."

I'm not putting aside environmental concerns for social media platform moderation!

Raccoon at TechHub :mastodon:

@jeztastic
Because I'm not speculating on one of those systems.

We do not have the means nor the interest nor even really any use for one of those large-scale fast-response LLM training and processing systems that require these massive supercomputers, like ChatGPT.
This would be a small AI, running as a process on one of our existing servers, doing a much lighter operation: statistical analysis of short blocks of text (especially Fedi-short) isn't nothing, but it'd only add a fraction of the current load.

We're putting the power/environmental concerns aside because they don't apply here: the increased power use would be inconsequential.

Raccoon at TechHub :mastodon:

@nus
Reminder that Fedi staff would be the ones training, maintaining, and watching it, as well as the ones who could pull the plug if it doesn't do a good job. I think a lot of this is fixed with the right training-data and oversight.

Raccoon at TechHub :mastodon:

@aka_quant_noir
I HAVE been saying a "community note" style system would be nice, so maybe this could be another use for that...

Raccoon at TechHub :mastodon:

@eskuero
Right, that would be the existing system this thing has to prove its worth over. Someone in the thread even suggested something slightly more advanced than that, picking up things like the presence of multiple things, like the word "Harris" appearing in a post with the "MutualAid" hashtag, or any political term in a post tagged as "Bloomscrolling".

I was just speculating with the AI thing, but I definitely would like to implement a system like that.

Raccoon at TechHub :mastodon:

@jglypt
There IS a part of the moderation interface which tells us how many reports have been sent about a user, if they have "strikes", and lets us look back at everything quickly.

Basically, when we look at you, we see this...

mekka okereke :verified:

@jdp23 @Raccoon @mattdm

Yup. It's a really hard problem. Perspective has made huge strides in this area, but problems persist.

If I describe a man as "beast mode," or "buff," or "swole," most white people know what I'm talking about. But if I describe a man as "sidewalk cracking," most white folk don't know what I'm talking about. That slang hasn't entered the white lexicon yet.

Non-toxic:
My N-word
My buff N-word
My beast mode N-word

76.52% toxic:
My sidewalk cracking N-word

1/N