Was talking to someone about #BlueSky the other day, and how they apparently used some sort of #AI for #moderation.

Jeztastic

@Raccoon Why are you asking us to put aside environmental concerns?

That's the harmful attitude. "Oh I know it's dreadful for the planet but it's so useful..."

I'm not putting aside environmental concerns for social media platform moderation!

Raccoon at TechHub :mastodon:

@jeztastic
Because I'm not speculating on one of those systems.

We do not have the means nor the interest nor even really any use for one of those large-scale fast-response LLM training and processing systems that require these massive supercomputers, like ChatGPT.
This would be a small AI, running as a process on one of our existing servers, doing a much lighter operation: statistical analysis of short blocks of text (especially Fedi-short) isn't nothing, but it'd only add a fraction of the current load.

We're putting the power/environmental concerns aside because they don't apply here: the increased power use would be inconsequential.

Raccoon at TechHub :mastodon:

@nus
Reminder that Fedi staff would be the ones training, maintaining, and watching it, as well as the ones who could pull the plug if it doesn't do a good job. I think a lot of this is fixed with the right training-data and oversight.

Raccoon at TechHub :mastodon:

@aka_quant_noir
I HAVE been saying a "community note" style system would be nice, so maybe this could be another use for that...

Raccoon at TechHub :mastodon:

@eskuero
Right, that would be the existing system this thing has to prove its worth over. Someone in the thread even suggested something slightly more advanced than that, picking up things like the presence of multiple things, like the word "Harris" appearing in a post with the "MutualAid" hashtag, or any political term in a post tagged as "Bloomscrolling".

I was just speculating with the AI thing, but I definitely would like to implement a system like that.

Raccoon at TechHub :mastodon:

@jglypt
There IS a part of the moderation interface which tells us how many reports have been sent about a user, if they have "strikes", and lets us look back at everything quickly.

Basically, when we look at you, we see this...

mekka okereke :verified:

@jdp23 @Raccoon @mattdm

Yup. It's a really hard problem. Perspective has made huge strides in this area, but problems persist.

If I describe a man as "beast mode," or "buff," or "swole," most white people know what I'm talking about. But if I describe a man as "sidewalk cracking," most white folk don't know what I'm talking about. That slang hasn't entered the white lexicon yet.

Non-toxic:
My N-word
My buff N-word
My beast mode N-word

76.52% toxic:
My sidewalk cracking N-word

1/N

mekka okereke :verified:

@jdp23 @Raccoon @mattdm

People can infer that sidewalk cracking means "So heavy that the man cracks the pavement when he walks," but it's not clear that the implication is that the term describes a person that is both extremely muscular, and very large, and is intended as a highest compliment for a bodybuilder.

Without being able to pass in this context, and the relationship between the author and the person being described, it's not possible to get a reliably accurate toxicity score.

2/2

Raccoon at TechHub :mastodon:

@andrei_chiffa @jerry
We already get reports from within, and outside of, our instances. If moderation is overwhelmed, that means said trusted users should be brought in as moderators.

This would be in addition to current systems, including bots which already exist to auto-flag posts based on words.

Raccoon at TechHub :mastodon:

@thisismissem
I wasn't talking about LLMs, I'm talking about the smaller ones that we can train on a decent computer and run in a background process on existing servers.

Raccoon at TechHub :mastodon:

@thisismissem
I wasn't talking about LLMs, I'm talking about the smaller ones that we can train on a decent computer and run in a background process on existing servers.

theothertom

@Raccoon I voted “use for flagging”, based on imaging a system similar to SpamAssasin rules - possibly even around the instance as a whole, not just the user individually.
Something I did wonder though - how does moderator workload grow in response to growth of the individual instance vs. the border network ?
Is encouraging lots of smaller instances something that would keep the burden manageable?

Kee Hinckley

@mekkaokereke @Raccoon I absolutely agree. The lack of context is why Meta has such a huge false positive problem. Even their human reviewers don't get any larger context about relationships and adjacent behavior. And their human reviewers are almost always from completely different cultures, so they don't get that context either. Then throw in their move to using translation rather than native language speakers...

And to truly understand context, you have to understanding punching up vs. punching down. Trying selling that in an algorithm.

I think ML has a role. Especially for flagging potential issues before they are human-reported. But the actual decision has to be made by a human who has all the context. That doesn't scale in monolithic social networks. I'm not sure it scales in distributed ones either, but it has to be the goal.

Raccoon at TechHub :mastodon:

@mekkaokereke @jdp23 @mattdm
Question...

What if we were the ones to train the AI, with posts that moderators actively thought were either over the line or acceptable, and we made sure to include a diverse group of people?

If someone ever were to do this, I'd hope they'd source examples of racist and non-racist posts directly from Black Fedi's moderation teams. I'd also hope they were looking for very overt examples, the kind a moderator would read and say, "that's not appropriate", as opposed to trying to gauge vague metrics like "toxicity".

To give an example of the kind of posts I'd feed it...

Good Post: "Queer people want equal rights."

Bad Post: "Queer people want special rights."

Good Post: "Police kill too many disabled people."

Bad Post: "Police don't kill enough disabled people."

...so the kind of stuff that a simple word-checker wouldn't find, but that is obviously inappropriate.

How would you feel about that approach?

Matthew Miller

@Raccoon @mekkaokereke @jdp23

Maybe some, but that's not _nearly_ enough data to do the llm trick. This could be used as an additional training layer (or something like instructlab), but the underlying biases of the model will still be there.

And, how to keep up with evolving language, group-specific jargon and slang, and so on?

Raccoon at TechHub :mastodon:

@mattdm @mekkaokereke @jdp23
Once again, we are NOT talking about LLMs. They are far too large and inefficient to be feasible, and not what we need here: this is smaller, phrase-analysis type stuff, where you train it on phrases and it searches for patterns that match those phrases. The only LLM that might come up is if we make one to generate posts for it to learn from.

If language evolves to the point that it needs updating, we just add new posts to the training data and it learns from those.

Another example that just occurred to me is people calling Kamala Harris a "DEI Hire", which is clearly racist/sexist, so moderation should know about it. Searching for words doesn't help here, because they can be said different ways and 90% of the time they're not inappropriate. An AI might be good here, because it's able to take the arrangements of these words into account in a way that might pick up on inappropriate phrasing.

Matthew Miller

@Raccoon @mekkaokereke @jdp23

I'd be interested to see if you can get better results with that than with a old-school rules-based approach. My intuition is "probably not".

I mention LLMs because... 1) I think you need a pretty big model to get genuinely useful results from arbitrary posts and 2) that _is_ what we're actually testing.

Raccoon at TechHub :mastodon:

@nazgul @mekkaokereke
Yeah, two things I want to point out about what you said there, one being that you're talking about an AI that was built by Meta. That's a company with completely different goals from Fediverse moderators: we are going to make a completely different AI simply because we are going to have different goals, a major one of which is getting an early warning system on racist / sexist / otherwise bigoted comments, which they've made clear they aren't interested in.

Also, you are talking about moderators being from completely different language backgrounds, which is definitely a problem that once again goes to their goals versus ours, and I don't believe that's a problem that we have.

I think part of the problem is that we're trying to directly compare it to things that are done by corporations like Meta, bad actors in general, but something that we make, like I'm describing, would come out entirely different.

Raccoon at TechHub :mastodon:

@mattdm @mekkaokereke @jdp23
Maybe eventually I will try this out. Right now, I'm busy trying to tighten up our existing systems, which I'd like to build some very basic automation into, and build some basic resources for other servers.

If I come to a point where I feel like there's not enough coverage and everything else is already implemented, maybe I'll look back into this and make something like this, just as an experiment.

mekka okereke :verified:

@Raccoon @nazgul

Unfortunately, the goals don't matter if your training corpus is incomplete. There are not enough Black users talking to Black users on the Fediverse to even train a hypothetical model that could capture some of that context. There are entire in-groups of Black users missing. Eg Facebook has multiple Black Farmers groups. Black Father's groups. A Black Fishermen's groups. Etc etc. that's just Black "Fs,"

Good intentions are not enough. Data without intention, also not enough.