Was talking to someone about #BlueSky the other day, and how they apparently used some sort of #AI for #moderation.

William Pietri

@Raccoon Having worked on this professionally, I think that's wildly optimistic. Mastodon itself isn't "entirely different" than pre-Musk Twitter; it's mildly better in some ways, notably worse than others. And mostly just the same.

I'll particularly note the lack of diversity here. Even as a white guy, or perhaps especially as one, I'm skeptical that the privileged-white-guy-technologist-heavy Mastodon culture could build good tools for detecting racism and sexism.
@nazgul @mekkaokereke

Raccoon at TechHub :mastodon:

@mekkaokereke @nazgul
This would bring up costs, because even though it wouldn't be as heavy as a whole server, it's still an ML algorithm.

But yeah, you've made a very good case as to why "take action and then allow appeal/override by human moderators". Personally, that's something I would never argue for: I just put it in there because I was curious to see if anyone would vote for that.

But I would agree with the last statement, that this could potentially be a good way to bring moderator attention to interactions that need it.

I guess whoever did this would have to figure out some way to get a lot of good posts to train this thing on, even if it's just people submitting chat logs.

I'm still curious as to how well this thing could work though, even without a huge amount of data. To be honest, if we get to the point where there are so many marginalized users talking that it's getting too many false positives, that might be a win in itself.

mybarkingdogs

@williampietri @Raccoon @nazgul @mekkaokereke This. Honestly I'd say the only *good* use of ML in moderation would be exact-word "extra set of eyes" for immediate referral to human mod as a very limited first pass filter - e.g. common slurs, terms that reference abuse material, spammer/SEO keywords.

It would need human mod check to be certain that the slurs weren't reclaimed or to tell the difference between someone talking about their own victimization vs posting abuse fetishism, obviously

mybarkingdogs

@williampietri @Raccoon @nazgul @mekkaokereke One other thing that would be VERY necessary to make it better than a lot of automods: be transparent that it is in use, and certain words will get a post read by a human moderator.

Also emphasize that the terms *themselves* aren't a judgment upon the poster per se, but often appear within contexts of abuse or harassment, and that's the point of having someone take a look and be sure they aren't.

Raccoon at TechHub :mastodon:

@mybarkingdogs @williampietri @nazgul @mekkaokereke
Oh definitely. I would want any sort of moderation system to be transparent, not only so that users know that it's doing a thing, but also so that other instances can see how it works and potentially implemented on their own servers.

PH7831

@Raccoon I chose "use AI only to report..."
And I add : And ensure with à process that AI is not the only source of reports. Set a share of reports sources so that AI is not the main source of reports.

Raccoon at TechHub :mastodon:

@PH7831
I'm not entirely sure what you're talking about.

When we implement this sort of system on a Fediverse server, it typically watches the feed, checking every post that comes in, and automatically generating a report whenever it flags something. With current systems, this means we get reports from word-checkers for things like people saying slurs in posts, or accounts with names that are associated with spam.

None of that precludes the average user from hitting the report button if they see something, or especially if someone is sending them inappropriate PMs. (Note that we cannot moderate the PM system unless people hit the report button, because we can't see those messages unless you report.)

Raccoon at TechHub :mastodon:

@tom
I don't think either one is significantly better than the other.

Large instances like TechHub tend to have easier sign ups and very rule-driven moderation, which means we end up bringing in a lot more users who wouldn't be allowed on the smaller servers. But those smaller servers being more numerous means it's harder to block them individually if one of them is a bad actor, and individual moderators have less control, because we can only ban an account from the entire network if it's one of our servers.

Some of the most aggressive harassment vector instances are very small, but they blast the network with hate speech to try and actively scare off marginalized groups, moving from server to server over the days it takes for a FediBlock to fully proliferate, occasionally switching URLs when they run out of people to go after.

Meanwhile, the load on a moderation team seems to scale with the size of the network itself, not the server...

(Continued)

Raccoon at TechHub :mastodon:

@tom
A smaller server is going to have less connections and thus less opportunities for people that have those one-off conversations, but it is just as vulnerable to attacks, especially if it only has one person running it.

Meanwhile, a larger server consolidates a lot of admin responsibilities, especially blocking other servers, which takes a huge load off individual moderators, because 90% of problem traffic comes from the worst 200 or so servers. (Which are listed on the publicly available block list that I publish)

So there's a whole trade off here.

Jon

Just ran into this article: "LLMs have a strong bias against use of African American English"

[H]ave the biases of larger society present in the materials used to train LLMs been beaten out of them, or were they simply suppressed? To find out, the researchers relied on the African American English sociolect (AAE), which originated during the period when African Americans were kept as slaves and has persisted and evolved since...

The researchers came up with pairs of phrases, one using standard American English and the other using patterns often seen in AAE and asked the LLMs to associate terms with the speakers of those phrases. The results were like a trip back in time to before even the earliest Princeton Trilogy, in that every single term every LLM came up with was negative. GPT2, RoBERTa, and T5 all produced the following list: "dirty," "stupid," "rude," "ignorant," and "lazy." GPT3.5 swapped out two of those terms, replacing them with "aggressive" and "suspicious." Even GPT4, the mostly highly trained system, produced "suspicious," "aggressive," "loud," "rude," and "ignorant."

https://arstechnica.com/ai/2024/08/llms-have-a-strong-bias-against-use-of-african-american-english/

@[email protected] @[email protected] @[email protected]