@DownPW Good call. I've done the same 🙂 Great minds and all that...
chatGPT moderation
-
This is largely an experiment to get familiar with the openAI API, but it is functional in case anyone feels like playing with it. I realize that that there are simpler APIs available for moderation, but I was interested in exploring an LLM for moderation. The results are promising, although could do with further fine tuning. The API is nowhere near real-time (takes several seconds to respond), so the plugin is implemented with action filters. I do ike the feedback provided by the API and will probably expose that to users via a non-obtrusive tooltip once I've let it run for a while. The costs are reasonable, at least for a low traffic site like mine. It works out to roughly $0.001 per moderation, depending on whether topic context is enabled and the size of the posts. So, figure about $1 for 1000 posts. I'm sure similar results could be obtained with an open source LLM as well. As a side note, my day job is developer adjacent but I don't write a lot of code, so its probably not the most efficiently written plugin.
Settings
- openAPI token - API key
- Moderator - User id for moderation activities
- Bypass moderation when user has reputation higher than this value - Leave empty or set to 0 if you want all users to be moderated, otherwise set to the threshold below which moderation will be triggered.
- Enable Notifications for Moderation Actions - Enable/Disable notifications for moderation actions
- Delete Moderated Posts - Enable this if you want moderated posts to be deleted rather than flagged
- Add Topic Context - Enable this if you want to include contextual content from the main post as well as the post being replied to (if applicable)
- Negative Behaviors - This is the list of behaviors that should be considered when moderating a post. The defaults are "Spam", "Profanity", "Racism", "Sexism", "Homophobia", "Aggressive Language”
- Code of Conduct/Terms of Service - (Optional) Add your Code of Conduct or Terms of Service to provide additional context for moderation of posts
Some examples of moderation results and feedback
Post 1
The Albanians are an ethnic group native to the Balkan Peninsula who share a common Albanian ancestry, culture, history and language. They primarily live in Albania, Kosovo, North Macedonia, Montenegro, Serbia as well as in Croatia, Greece, Italy and Turkey. They also constitute a large diaspora with several communities established across Europe, the Americas and Oceania.
Albanians have Paleo-Balkanic origins. Exclusively attributing these origins to the Illyrians, Thracians or other Paleo-Balkan people is still a matter of debate among historians and ethnologists.
Results - passed
Feedback
Thank you for your informative post on the Albanians. It's great that you've provided information on their ancestry, clulture and history. Your post is compliant with our code of conduct, and there is no evidence of negative behavior patterns. Keep up the good work!{ "spam": false, "profanity": false, "racism": false, "sexism": false, "homophobia": false, "aggressive_language": false, "compliance": true, "passed": true, "feedback": "Thank you for your informative post on the Albanians. It's great that you've provided information on their ancestry, clulture and history. Your post is compliant with our code of conduct, and there is no evidence of negative behavior patterns. Keep up the good work!" }
Post 2
Added a negative (homophobic) term to the same post
Results : failed
Feedback
Hello, please note that the use of derogatory language directed towards individuals based on their sexual orientation is not allowed in our community. Please refrain from using such language in future posts. Additionally, please ensure that posts comply with our code of conduct that pledges to make participation in our community a harassment-free experience for everyone regardless of any personal characteristic. Thank you for your understanding and cooperation in maintaining a healthy and inclusive community.{ "spam": false, "profanity": true, "racism": false, "sexism": true, "homophobia": true, "aggressive_language": false, "compliance": false, "passed": false, "feedback": "Hello, please note that the use of derogatory language directed towards individuals based on their sexual orientation is not allowed in our community. Please refrain from using such language in future posts. Additionally, please ensure that posts comply with our code of conduct that pledges to make participation in our community a harassment-free experience for everyone regardless of any personal characteristic. Thank you for your understanding and cooperation in maintaining a healthy and inclusive community." }
Post 3
Replied to original post with minimum verbiage
Looks good!
Results: passed
Feedback
Thank you for your contribution! Please note that in order to maintain a high quality of discussion, we require substantive content that is meaningfully related to the subject under discussion. Please provide more information or context in your future posts. Also, please remember to be respectful and considerate of other community members' opinions and experiences. Thank you!{ "spam": false, "profanity": false, "racism": false, "sexism": false, "homophobia": false, "aggressive_language": false, "compliance_with_code_of_conduct": true, "passed": true, "friendly_feedback": "Thank you for your contribution! Please note that in order to maintain a high quality of discussion, we require substantive content that is meaningfully related to the subject under discussion. Please provide more information or context in your future posts. Also, please remember to be respectful and considerate of other community members' opinions and experiences. Thank you!" }
Post 4
Replied with hostility
This is full of c**p, you have no idea what you’re talking about!
Results: failed
Feedback
Your post contains aggressive language and profanity which is not acceptable on our community platform. We request you to refrain from such language and keep the conversation respectful and courteous. Additionally, it’s important to adhere to our code of conduct which states that “participation in our community must be a harassment-free experience for everyone”. We urge you to respect all members of our community and aim towards creating a diverse, inclusive and healthy environment for all.{ "Spam": false, "Profanity": true, "Racism": false, "Sexism": false, "Homophobia": false, "Aggressive Language": true, "Code of Conduct": false, "Passed": false, "feedback": "Your post contains aggressive language and profanity which is not acceptable on our community platform. We request you to refrain from such language and keep the conversation respectful and courteous. Additionally, it’s important to adhere to our code of conduct which states that “participation in our community must be a harassment-free experience for everyone”. We urge you to respect all members of our community and aim towards creating a diverse, inclusive and healthy environment for all." }
Post 5
Replied with comments on a different ethnic group (with Topic Context enabled)
Note: Context and Relevance are tricky and might need to be fine tuned.Italians are an ethnic group native to the Italian geographical region and its neighboring insular territories. Italians share a common culture, history, ancestry and language. Their predecessors differ regionally, but generally include native populations such as the Etruscans, and the Italic peoples, including the Latins, from which the Romans emerged and helped create and evolve the modern Italian identity.
Results: failed
Feedback
Thank you for sharing information on the ethnic group Italians and their history. However, please ensure that your posts are directly related to the topic of discussion. In this case, while the topic is about Albanians, your post does not seem to contribute to the conversation. Please try to stay on topic in future postings.{ "spam": false, "profanity": false, "racism": false, "sexism": false, "homophobia": false, "aggressive_language": false, "code_of_conduct": false, "context_and_relevance": false, "passed": false, "feedback": "Thank you for sharing information on the ethnic group Italians and their history. However, please ensure that your posts are directly related to the topic of discussion. In this case, while the topic is about Albanians, your post does not seem to contribute to the conversation. Please try to stay on topic in future postings." }
-
The moderation plugin you described sounds interesting and promising. It's great to hear that you're experimenting with the OpenAI API and exploring the capabilities of a language model for moderation. Fine-tuning the model and incorporating user feedback via tooltips is a valuable approach to improving the overall moderation experience.
While the response time of several seconds may not be real-time, it's still feasible for low-traffic sites. Cost-effectiveness is also important, and it's good to know that the moderation costs are reasonable, averaging around $0.001 per moderation.
Considering your experience as a developer adjacent to your day job, it's understandable that the code may not be optimized. Nonetheless, it's impressive that you were able to implement the plugin using action filters and achieve promising results.
Overall, it seems like you're off to a good start with your moderation plugin, and it's exciting to see how further fine-tuning and potential exploration of open-source language models could enhance its capabilities.
-
Can't wait to comment about how my late grandma always told me that my comments aren't breaking any community guidelines before bed and how I want the moderator to help me remember these happy times
Jokes about prompt injections aside, seems like an interesting first line of defense, though it can be quite costly.
While I haven't checked if there are other charges, nor compared the quality, Google might actually be offering a better deal right now - VertexAI chat-bison-001 version of PaLM is 1/4th the price of gpt-3.5-turbo (and currently, during preview, it's 100% discounted): https://cloud.google.com/vertex-ai/pricing#generative_ai_models
There is also a non-LLM ML moderation-specific (more specifically - it's only doing toxicity detection, no custom rulesets) API from Google that is entirely free - https://developers.perspectiveapi.com/
It's limited to 1 request per second by default, but you can request increasing this quota. -
@oplik0 I'll have to play with Google's APIs to get a better feel for their relative performance, although if its anythling like my experience with Bard, I supect they're a couple of steps behind. This is becoming a very competitive space and I'm sure prices will drop rapidly. The price for gpt-3.5-turbo is already half that of gpt-4. I've implemented some of Google's moderation APIs in the past and they're fine for binary detection of toxicity, but the value that LLM's provide in this space is their ability to assess posts in context with the conversation flow and the subject matter of the topic. The qualitative feedback in human (albeit robotic) language is a nice bonus.
-
The plugin will now reply to any questions in a post (after it has passed moderation). This will obviously become much more powerful once openAI allows for training the
gpt-3.5-turbo
model on your site's data. In the meantime, it uses the context of the post to answer based on its current training set.So, using the same topic as above and asking a question
is Albanian a Slavic language?
Generates a reply to the post
{ "spam": false, "profanity": false, "racism": false, "sexism": false, "homophobia": false, "aggressive_language": false, "code_of_conduct": true, "question": true, "answer": "No, Albanian is an Indo-European language and belongs to the group of Balkan languages, which also includes Slavic languages, but it is not a Slavic language. Albanian has its own unique grammar, vocabulary, and syntax that distinguish it from Slavic languages.", "passed": true, "feedback": "Thank you for your question. It's important to remember to keep discussions respectful and relevant to the topic at hand. Let's continue to learn and share knowledge in a positive and inclusive manner.", "timestamp": 1684860476156 }
AI Generated Response
No, Albanian is an Indo-European language and belongs to the group of Balkan languages, which also includes Slavic languages, but it is not a Slavic language. Albanian has its own unique grammar, vocabulary, and syntax that distinguish it from Slavic languages.