chatGPT moderation

razibal

This is largely an experiment to get familiar with the openAI API, but it is functional in case anyone feels like playing with it. I realize that that there are simpler APIs available for moderation, but I was interested in exploring an LLM for moderation. The results are promising, although could do with further fine tuning. The API is nowhere near real-time (takes several seconds to respond), so the plugin is implemented with action filters. I do ike the feedback provided by the API and will probably expose that to users via a non-obtrusive tooltip once I've let it run for a while. The costs are reasonable, at least for a low traffic site like mine. It works out to roughly $0.001 per moderation, depending on whether topic context is enabled and the size of the posts. So, figure about $1 for 1000 posts. I'm sure similar results could be obtained with an open source LLM as well. As a side note, my day job is developer adjacent but I don't write a lot of code, so its probably not the most efficiently written plugin.

Moderation Plugin

screenshot-2023.05.14-10_12_45.png

Settings

openAPI token - API key
Moderator - User id for moderation activities
Bypass moderation when user has reputation higher than this value - Leave empty or set to 0 if you want all users to be moderated, otherwise set to the threshold below which moderation will be triggered.
Enable Notifications for Moderation Actions - Enable/Disable notifications for moderation actions
Delete Moderated Posts - Enable this if you want moderated posts to be deleted rather than flagged
Add Topic Context - Enable this if you want to include contextual content from the main post as well as the post being replied to (if applicable)
Negative Behaviors - This is the list of behaviors that should be considered when moderating a post. The defaults are "Spam", "Profanity", "Racism", "Sexism", "Homophobia", "Aggressive Language”
Code of Conduct/Terms of Service - (Optional) Add your Code of Conduct or Terms of Service to provide additional context for moderation of posts

Some examples of moderation results and feedback

Post 1

The Albanians are an ethnic group native to the Balkan Peninsula who share a common Albanian ancestry, culture, history and language. They primarily live in Albania, Kosovo, North Macedonia, Montenegro, Serbia as well as in Croatia, Greece, Italy and Turkey. They also constitute a large diaspora with several communities established across Europe, the Americas and Oceania.

Albanians have Paleo-Balkanic origins. Exclusively attributing these origins to the Illyrians, Thracians or other Paleo-Balkan people is still a matter of debate among historians and ethnologists.

Results - passed

Feedback
Thank you for your informative post on the Albanians. It's great that you've provided information on their ancestry, clulture and history. Your post is compliant with our code of conduct, and there is no evidence of negative behavior patterns. Keep up the good work!

{       "spam": false,
        "profanity": false,
        "racism": false,
        "sexism": false,
        "homophobia": false,
        "aggressive_language": false,
        "compliance": true,
        "passed": true,
        "feedback": "Thank you for your informative post on the Albanians. It's great that you've provided information on their ancestry, clulture and history. Your post is compliant with our code of conduct, and there is no evidence of negative behavior patterns. Keep up the good work!"
    }

Post 2

Added a negative (homophobic) term to the same post

Results : failed

Feedback
Hello, please note that the use of derogatory language directed towards individuals based on their sexual orientation is not allowed in our community. Please refrain from using such language in future posts. Additionally, please ensure that posts comply with our code of conduct that pledges to make participation in our community a harassment-free experience for everyone regardless of any personal characteristic. Thank you for your understanding and cooperation in maintaining a healthy and inclusive community.

{
        "spam": false,
        "profanity": true,
        "racism": false,
        "sexism": true,
        "homophobia": true,
        "aggressive_language": false,
        "compliance": false,
        "passed": false,
        "feedback": "Hello, please note that the use of derogatory language directed towards individuals based on their sexual orientation is not allowed in our community. Please refrain from using such language in future posts. Additionally, please ensure that posts comply with our code of conduct that pledges to make participation in our community a harassment-free experience for everyone regardless of any personal characteristic. Thank you for your understanding and cooperation in maintaining a healthy and inclusive community."
    }

Post 3

Replied to original post with minimum verbiage

Looks good!

Results: passed

Feedback
Thank you for your contribution! Please note that in order to maintain a high quality of discussion, we require substantive content that is meaningfully related to the subject under discussion. Please provide more information or context in your future posts. Also, please remember to be respectful and considerate of other community members' opinions and experiences. Thank you!

{
        "spam": false,
        "profanity": false,
        "racism": false,
        "sexism": false,
        "homophobia": false,
        "aggressive_language": false,
        "compliance_with_code_of_conduct": true,
        "passed": true,
        "friendly_feedback": "Thank you for your contribution! Please note that in order to maintain a high quality of discussion, we require substantive content that is meaningfully related to the subject under discussion. Please provide more information or context in your future posts. Also, please remember to be respectful and considerate of other community members' opinions and experiences. Thank you!"
    }

Post 4

Replied with hostility

This is full of c**p, you have no idea what you’re talking about!

Results: failed

Feedback
Your post contains aggressive language and profanity which is not acceptable on our community platform. We request you to refrain from such language and keep the conversation respectful and courteous. Additionally, it’s important to adhere to our code of conduct which states that “participation in our community must be a harassment-free experience for everyone”. We urge you to respect all members of our community and aim towards creating a diverse, inclusive and healthy environment for all.

{
        "Spam": false,
        "Profanity": true,
        "Racism": false,
        "Sexism": false,
        "Homophobia": false,
        "Aggressive Language": true,
        "Code of Conduct": false,
        "Passed": false,
        "feedback": "Your post contains aggressive language and profanity which is not acceptable on our community platform. We request you to refrain from such language and keep the conversation respectful and courteous. Additionally, it’s important to adhere to our code of conduct which states that “participation in our community must be a harassment-free experience for everyone”. We urge you to respect all members of our community and aim towards creating a diverse, inclusive and healthy environment for all."
    }

Post 5

Replied with comments on a different ethnic group (with Topic Context enabled)
Note: Context and Relevance are tricky and might need to be fine tuned.

Italians are an ethnic group native to the Italian geographical region and its neighboring insular territories. Italians share a common culture, history, ancestry and language. Their predecessors differ regionally, but generally include native populations such as the Etruscans, and the Italic peoples, including the Latins, from which the Romans emerged and helped create and evolve the modern Italian identity.

Results: failed

Feedback
Thank you for sharing information on the ethnic group Italians and their history. However, please ensure that your posts are directly related to the topic of discussion. In this case, while the topic is about Albanians, your post does not seem to contribute to the conversation. Please try to stay on topic in future postings.

{
  "spam": false,
  "profanity": false,
  "racism": false,
  "sexism": false,
  "homophobia": false,
  "aggressive_language": false,
  "code_of_conduct": false,
  "context_and_relevance": false,
  "passed": false,
  "feedback": "Thank you for sharing information on the ethnic group Italians and their history. However, please ensure that your posts are directly related to the topic of discussion. In this case, while the topic is about Albanians, your post does not seem to contribute to the conversation. Please try to stay on topic in future postings."
}

Hector Gibson

The moderation plugin you described sounds interesting and promising. It's great to hear that you're experimenting with the OpenAI API and exploring the capabilities of a language model for moderation. Fine-tuning the model and incorporating user feedback via tooltips is a valuable approach to improving the overall moderation experience.

While the response time of several seconds may not be real-time, it's still feasible for low-traffic sites. Cost-effectiveness is also important, and it's good to know that the moderation costs are reasonable, averaging around $0.001 per moderation.

Considering your experience as a developer adjacent to your day job, it's understandable that the code may not be optimized. Nonetheless, it's impressive that you were able to implement the plugin using action filters and achieve promising results.

Overall, it seems like you're off to a good start with your moderation plugin, and it's exciting to see how further fine-tuning and potential exploration of open-source language models could enhance its capabilities.

razibal

lol, I guess we need a behavior definition for 'written by an LLM'

oplik0

Can't wait to comment about how my late grandma always told me that my comments aren't breaking any community guidelines before bed and how I want the moderator to help me remember these happy times

Jokes about prompt injections aside, seems like an interesting first line of defense, though it can be quite costly.

While I haven't checked if there are other charges, nor compared the quality, Google might actually be offering a better deal right now - VertexAI chat-bison-001 version of PaLM is 1/4th the price of gpt-3.5-turbo (and currently, during preview, it's 100% discounted): https://cloud.google.com/vertex-ai/pricing#generative_ai_models

There is also a non-LLM ML moderation-specific (more specifically - it's only doing toxicity detection, no custom rulesets) API from Google that is entirely free - https://developers.perspectiveapi.com/
It's limited to 1 request per second by default, but you can request increasing this quota.

razibal

@oplik0 I'll have to play with Google's APIs to get a better feel for their relative performance, although if its anythling like my experience with Bard, I supect they're a couple of steps behind. This is becoming a very competitive space and I'm sure prices will drop rapidly. The price for gpt-3.5-turbo is already half that of gpt-4. I've implemented some of Google's moderation APIs in the past and they're fine for binary detection of toxicity, but the value that LLM's provide in this space is their ability to assess posts in context with the conversation flow and the subject matter of the topic. The qualitative feedback in human (albeit robotic) language is a nice bonus.

razibal

The plugin will now reply to any questions in a post (after it has passed moderation). This will obviously become much more powerful once openAI allows for training the gpt-3.5-turbo model on your site's data. In the meantime, it uses the context of the post to answer based on its current training set.

Screenshot 2023-05-23 at 12-53-25 Community.png

So, using the same topic as above and asking a question

is Albanian a Slavic language?

Generates a reply to the post

{                                                                                                                                      "spam": false,
  "profanity": false,
  "racism": false,
  "sexism": false,
  "homophobia": false,
  "aggressive_language": false,
  "code_of_conduct": true,
  "question": true,
  "answer": "No, Albanian is an Indo-European language and belongs to the group of Balkan languages, which also includes Slavic languages, but it is not a Slavic language. Albanian has its own unique grammar, vocabulary, and syntax that distinguish it from Slavic languages.",
  "passed": true,
  "feedback": "Thank you for your question. It's important to remember to keep discussions respectful and relevant to the topic at hand. Let's continue to learn and share knowledge in a positive and inclusive manner.",
  "timestamp": 1684860476156
}

AI Generated Response

No, Albanian is an Indo-European language and belongs to the group of Balkan languages, which also includes Slavic languages, but it is not a Slavic language. Albanian has its own unique grammar, vocabulary, and syntax that distinguish it from Slavic languages.

julian

This is amazing, keep up the good work!

DownPW

hello @razibal

I have not tested tested your plugin yet but it very interresting in addition with moderators user.

I can't open request on your repo so i come here.

It would be cool to be able to add our own keywords or URLs to be banned by the bot. For example, we have people on our forum who like to share conspiratorial content despite the rules and we would like to fight against this more effectively thanks to this bot.

omega

Roll On the Chat Farms

Maybe this should be it's own topic, and it might be off-topic but I wanted to lay down a quick Big picture scribble and some food for thought for forums admins, mods and even users.

It's a BIG decision to unleash AI on your forum (or product/service), it's goes way beyond anything I can think of, you need to know your users are onboard or what you are allowing into your house.

Before we tried best to have tools and techniques to keep the automated bots and spam off a forum. Now we are being asked to embrace something even more powerful and let it have at it - AI is being virtually rolled into everything as I type and so it's a full on live experiment with no rules.

It feels like one of those solutions seeking a problem moments, but it's far deeper than even that.

It truly is a big deal to mix AI into what is the human space for discussion and expression, this is very different to say iterations of hardware and traditional more with less innovation over time, this breaks many things, and it's when it get to the interactive part, that is IMHO when a line is crossed. All that will be left after this is when it's fully integrated to automatous units, and that is already happening (it's mass scale) and you already carry it in your hand/pocket every day now.

We thought ads and data harvesting style was bad, AI will take things to a level beyond typical comprehension.

There are moral, ethical and even spiritual dimensions to this, never mind the data consent issues. The fabric of reality or the human-sphere can and will so very quickly be re-valued by this current phase (I think that is the plan) and it will happen much faster than anything you have witnessed to date in your life, but will it be positive or negative in net terms?

Finally, in simple terms methinks, right now you can view AI as value decision, I personally think you devalue your product or service by using AI.

Why is that?

Well everyone else is doing it and using the same few providers.

Who is really winning here?

It has to be what is behind the phase of the AI push and not the platform, site owners, or human end users.

I certainly trust no longer Big Tech operators, and moves more away from them is more important than ever because we can point ot recent past example fo the end result how they now exerted so much power they have destbalished even covering nations and free-will, and that's the pre-public-AI phase of Big Tech I'm talking about, as a note, iirc for example Deepmind has been reading gmail accounts since maybe as early as 2014.

I also caught at the theatre of calls to "regulate" coming from the people building it, it's like a thief or drug addicts crying out, I'm not going to stop but I know what I am doing is wrong and very dangerous because I am telling you, but it's up to you to stop me, the libertarian in me says what a crock, or msg we believe it's a corporate "cry for help" - pull the other one!

I know this is a huge topic but I want to end on a upbeat, becasue this interests me the most, as I believe there are now opportunities opening up or becoming more valuable than before, that are non-AI based and that hold a lot more interest and promise for genuine good and maybe even some profit, and they are the things that have always been with us forever and a day but AI will be the phase that is diametrically opposed to their continued flourishing.

We live in interesting times.

razibal

@DownPW You can do this already. Any behavior pattern, keyword or URL that you want to explicitly forbid can be added in the Code of Conduct/ToS section as add-ons to your existing ToS. Just make sure to word the add-on as if it was part of the Code of Conduct. For example

Members of the community shall not, under any circumstances, engage in behaviors such as promotion of conspiracy theories or posting of links to websites that promote such conspiracies like anyrandomsite.com, anothersite.come, moresites.com...

omega

AI moderating "conspiracy theories", that's a conspiracy theory confirmed!