the temptation to reply "make sure your instance administrator has some rules carved out in nginx/haproxy/whatever they use matching on certain user agents to block the scraping of your account." because i know for a fact the instance in question does ...
-
@[email protected] of course, right but the point is still that blocking does prevent some of it. i do not remove my front door because people can pick the locks
-
@[email protected] may I ask what some good user agents are to block
-
@[email protected] gonna admit i am not the one that wrote these rules but here are some of ours...
# Deny bad user-agents acl deny-user-agent-acl hdr_sub(user-agent) -i Bytespider acl deny-user-agent-acl hdr_sub(user-agent) -i QQDownload acl deny-user-agent-acl hdr_sub(user-agent) -i TencentTraveler acl deny-user-agent-acl hdr_sub(user-agent) -i Content-Nation acl deny-user-agent-acl hdr_sub(user-agent) -i SemrushBot acl deny-user-agent-acl hdr_sub(user-agent) -i FediList ## Deny Facebook acl deny-user-agent-acl hdr_sub(user-agent) -i FacebookExternalHit acl deny-user-agent-acl hdr_sub(user-agent) -i FacebookCatalog acl deny-user-agent-acl hdr_sub(user-agent) -i FacebookBot ## Deny broken fedi implementations acl deny-user-agent-acl hdr_sub(user-agent) -i Lemmy ## From Seirdy <3 ### IP and Trademark Scanners acl deny-user-agent-acl hdr_sub(user-agent) -i TurnitinBot acl deny-user-agent-acl hdr_sub(user-agent) -i NPBot acl deny-user-agent-acl hdr_sub(user-agent) -i SlySearch acl deny-user-agent-acl hdr_sub(user-agent) -i BLEXBot acl deny-user-agent-acl hdr_sub(user-agent) -i CheckMarkNetwork acl deny-user-agent-acl hdr_sub(user-agent) -i BrandVerity ### Data Brokers acl deny-user-agent-acl hdr_sub(user-agent) -i PiplBot ### AI Scrapers acl deny-user-agent-acl hdr_sub(user-agent) -i ChatGPT-User acl deny-user-agent-acl hdr_sub(user-agent) -i GPTBot acl deny-user-agent-acl hdr_sub(user-agent) -i Google-Extended acl deny-user-agent-acl hdr_sub(user-agent) -i anthropic-ai acl deny-user-agent-acl hdr_sub(user-agent) -i Claude-Web
-
@[email protected] Lemmy is blocked for a specific reason, and that is because federation between lemmy and misskey (including forks) is fucked. you will get a job queue nightmare if you do not block lemmy instances
-
@[email protected] by looking at your logs and googling certain user agents you might find more but this is not meant to be a comprehensive list of every single scraper ever
-
@[email protected] @[email protected] ooooooooh. What’s going on there? Do you have a link to maybe an explanation?
I played around with lemmy’s AP library but decided it wasn’t for me, implementation wise, so I’m kind of glad to hear it was a wise choice not to use… -
@[email protected] @[email protected] no idea honestly
-
@[email protected] @[email protected] fair
I didn’t like that it implemented AS/AP Objects as Rust traits, which is probably the ‘right’ way to do it in Rust but I haaaaaaate it, mental model wise and just found it non-comprehensible to work with as a way to learn ActivityPub. It makes more sense now that I know some AP but. -
-
@puppygirlhornypost2 @ch0ccyra1n yup, abuse is not completely preventable
raising the threshold does remove a sizeable percentage of it though, but never 100%
though even adding a threshold (like this) will remove 50%, or something thereabouts
-
@puppygirlhornypost2 @frawst Don't the "AI" ones retry with a different useragent if they get a 403?
-
@[email protected] @[email protected] sometimes, sometimes not.