Cool project: "Nepenthes" is a tarpit to catch (AI) web crawlers.
-
Cool project: "Nepenthes" is a tarpit to catch (AI) web crawlers.
"It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse."
-
@tante I have mixed feelings.
Crawlers should respect robots.txt….
At the same time: there is clearly an emotionally based bias happening with LLM’s.
I feel weird about the idea of actively sabotaging. Considering it is only towards bad actors… and considering maybe robots.txt often are too restrictive in my opinion… the gray areas overlap a bit.
Why should we want to actively sabatoge AI dev? Wouldn’t that lead to possible catastrophic results? Who benefits from dumber ai?
-
-
@tante @dalias
Heh, https://wookieepedia.org has been functioning this way for me I opened up its robots.txt: it's dynamically generated on demand, ALL links work, and ALL pages exist. It’s generated so much sustained load that I may need to throttle it too! -
@altruios Hi, author of Nepenthes here.
I respect your discomfort, but honestly I'm angry enough about their behavior I want to see them burn. There's been far too much of this:
https://mastodon.social/@khobochka/113724300122190730
I cannot trust traffic to my site to be harmless, so I don't see any reason why something connecting to every site on the internet should be able to trust the site isn't harmful.
@tante -
@inthehands @tante @dalias OMG, that is a spectacular use of "hirsute." Bravo.
-
-
-
-
-
-
@woe2you @altruios @tante Because if we gobbled up millions of copyrighted works and produced derivatives of them, we'd be facing decades in prison, billions in fines. But when they seize our stuff that way and enclose it and use it to manufacture slop, in gross violation of copyright, it's deemed legitimate business.
-
@woe2you @altruios @tante You should understand that LLMs have no legitimate purposes. They do not produce intelligence or knowledge or information. They produce *information-shaped* slop. The only way they get better is getting better at deceiving fools that the slop is what it looks like.
If you don't understand what they are, what they're doing, and how they do it, then kindly stop calling the reactions by people who do "knee-jerk".
-
-
@dalias @altruios @tante I can talk to my smart home in natural language instead of having to sound like a BASIC program and get the name of every device precisely right, and it can respond in natural language. How is that not a "legitimate purpose" per your definition?
If you could explain without calling me an asshole or a simp that would be appreciated too.
-
-
@[email protected] @[email protected] @[email protected] @[email protected] yeah. Microsoft Power Automate, Scratch and several other "languages" I can think of on the top of my head that allow you to more or less skip the BASIC syntax and program things by moving blocks containing logic around. You don’t need an LLM for that and you shouldn’t use one either. tell an LLM to do 1+2 and it won’t calculate that answer. Instead it’ll hallucinate, I’ve seen it spit out 5 for 2+2 because that’s a common reference with 1984 and the radio head song 2+2=5
-
@[email protected] @[email protected] @[email protected] @[email protected] basic voice commands do not need an LLM or a language (I mean, command line programs don’t need to compile their arguments). voice detection isn’t built on the same thing.
-