on #bluesky there literally is the guy with the big dial constantly looking back at the audience for approval like a contestant on the price is right
-
@dalias @aud @jonny mm. the models that detect spam are essentially the same models that generate spam, just used in a slightly different mode. who wins is a question of who has more data.
we do not have more data than google does, and the spammers have breached the walls of google's castle and are pouring into the keep.
-
@dalias @aud @jonny we do think there might be something to the idea of refusing to index (just flat out refusing, not down-ranking) any site that is seeking any sort of profit. google has a hard classification problem in part because there isn't a lot of difference between spam and things the company considers legitimate.
-
@ireneista @aud @jonny You can also supplement by boosting sources linked from Wikipedia and penalizing ones edited out of Wikipedia as spam or illegitimate.
Google's quality plummeted the more they tried to bury Wikipedia.
-
@dalias @ireneista @aud i just gotta believe a global search index is more complicated than that
-
@[email protected] @[email protected] @[email protected] (I definitely agree; I don’t know of a way to automate it or whether one even should, but)
-
@[email protected] @[email protected] @[email protected] my personal feeling is “start small”. Do not try to encompass everything. The slop will come and go and including it just rewards it.
I guess I’m somewhat advocating for a sort of organically grown dataset. -
@aud @jonny @ireneista I don't think "seeking profit" is the right distinction. "Displaying any sort of deceptive ads" is closer.
This is similar to the whole fedi HOA thing of harassing anyone who's trying to sell their art because folks can't distinguish that from capitalism.
-
@aud @jonny @ireneista Crawling rooted from Wikipedia really would not be that bad an idea.
-
@dalias @aud @ireneista bootstrapping off anything is great but wikipedia links to like what percent of the web i wonder.
-
@jonny @dalias @aud @ireneista that depends if you count revisions. if you include revisions it has been spammed to oblivion.
-
@dysfun @dalias @aud @ireneista well ok, assuming you mean stable links since the whole point of bootstrapping an index off it would be spam filtration
-
@jonny @dalias @aud @ireneista going to be better than most, but i'm not sure i'd jjust feed it all in and hope it's all signal.
-
@dysfun @dalias @aud @ireneista same!
-
-
@jonny @dalias @aud @ireneista i considered this a while back btw, before putting in for a different project instead. main difference is i was going to curate the index, like the old days.
-
@dysfun @jonny @dalias @aud yeah we honestly are more enthusiastic about curated indexes than automated ones
... but then that line of thought takes us full circle to thinking the actual valuable thing is site owners having their personal favorite links at the bottom, and, this time around, refusing to take money for placement and not letting the prevalence of search engines convince them it's pointless
-
-
@[email protected] @[email protected] @[email protected] @[email protected] my current thought is that we need a map. How we use the map is, to some extent, not really germane to the underlying problem of “we don’t have one and we need one”.
I don’t know if this thought is really yet at the core of the issue but it’s where I am right now. I think, implicitly, I have an idea that as the map is drawn, we won’t just be exploring places but also building new ones and writing them down if they’re the kind of thing we might like to see.
It’s probably oversimplified and possibly naive, but. -
@aud @jonny @dalias @ireneista a while back, i talked about making a bookmark sharing tool, like the old days. i thought it might be a good way to seed a search engine.
-
@[email protected] @[email protected] @[email protected] @[email protected] it might not even be best to have a map, but many. In fact, I think that might be ideal. I think a concentration of visibility or control is not appropriate. It’s not hard to see how the concept of search has created awful hierarchies and rewards exploitation.