I know the Internet Archive has been under a ton of infrastructural pressure lately, but anyone have any idea about how long they might take to review an application and get back to you?
-
@[email protected] agreed.
(that's actually a big reason why we're having this discussion, in that I feel very free and un-judged to propose whatever comes to mind, whether I can see and work through the flaws or not. You're not only very insightful and helpful, but you're also very non-judgemental and safe to bounce even non-working ideas against. It's nice) -
@[email protected] Well, and then I think that comes down to a sort of fundamental issue with the approach, too, which harkens back to you saying that maybe search itself is dead. Constant 're-training' might stem the bleeding, but it also might mean your approach is just... not going to work and you're just throwing bandaids on top of it.
(see: LLMs and "fixes" for the most obvious example. But SEO spam is possibly a form of that as well) -
@aud awww!!!! that's really sweet of you to say. we try really hard at that, but we still feel like we fail a lot
-
@[email protected] No no, it's good. Even though I'm less devoted to the idea of 'search' as my language might indicate (I'm really just using it as a shorthand for 'how do get information?'), it's still very much bound up in my thinking as to how information might be obtained. I mean, the idea of asking questions and receiving an answer and judging its suitability is... well, old. As you're aware. And search engines try to present themselves as basically that in a very literal form. So as the "current existing model", they definitely dominate my brain space.
-
@[email protected] I know the feeling (as someone who tries for the same thing... most of the time. When someone is in good faith), but I can safely say I always feel that way about any discussion we've had!
-
@aud @ireneista i will note that i generally believe "relevancy" which is abstract and separate from any specific problem domain to be
(1) a direct precursor to current slop models
(2) more of a business metric analogous to "engagement"
(3) vastly inferior in all respects to technology that leverages domain expertise. when google search provided a clear set of composable rules for their distrbuted document matcher, everyone loved them! "relevance" was considered something to infer and not to demonstrate (need an even stronger verb here tbh) -
@aud @ireneista butting in with different lived experiences here but i don't at all see this as an intrinsic problem. i think this is a very poorly studied research space for various reasons but i think there is a massive amount of work to be done on public infrastructure for power and protection instead of merely austerity
-
@hipsterelectron @aud (we also feel that we should point out that we've done way too little talking about library science, in this conversation so far. library scientists study questions such as how people find things! unfortunately, we're not well read on it...........)
-
@[email protected] @[email protected] ah, yeah, I should be careful with my language here (ba dum tsh). I see the point you’re making here and agree. I was thinking of relevancy in a very vague manner to indicate whether the results are “correct”, not so much as a metric or goal unto itself.
-
Asta [AMP]replied to Irenes (many) last edited by [email protected]
@[email protected] @[email protected] library scientists are literally the ideal people I would want to.. build tools for. When I first started thinking about who should run and and curate this, I was thinking specifically of librarians and library scientists (not sure if they’re the same).
-
@[email protected] @[email protected] which, unfortunately, I’m in the same position of as not being well read on the topic. I imagined software that could be run and managed by librarians and libraries, for instance, and that people could use.
The funding is barely there for the libraries, let alone servers, but I don’t think the idea is inherently bad. -
Asta [AMP]replied to Irenes (many) last edited by [email protected]
@[email protected] @[email protected] since we’re on the subject, I suppose this is why I haven’t thought too much about anything EXCEPT for how to distribute the computational workload and datasets while maintaining integrity. I just don’t currently have the knowledge about what type of algorithm and UI should be implemented for this sort of thing. I’m interested, for sure, but it’s definitely not my area of expertise.
-
@aud @hipsterelectron it is a topic we have significant background on, and can definitely advise on, but we need to figure out what we want first
-
@aud @ireneista Are anonymous contributions a requirement? My understanding is that one can have anonymous contributions or prevent ban evasion but not both. One can use a trusted third party but that just leads to an infinite descent problem.
-
@[email protected] @[email protected] fuck it, we’re mashing up usenet and the Dewey decimal system and not stopping till the money runs out!
(but in all seriousness, perhaps I should start looking up some library science stuff. I could try and start building code around some idea of what the data might look like, except if I’m wrong the nature of how it might need to scale out will change, so there’s not much point in building anything (unless I just want a search clone) till I have some idea of the best way to…
Forgive me for not closing the parentheses, but given the adversarial nature of the ad-surveillance-industrial complex, I’m going to call it “finding a piece of hay in a needlestack”. Anyway, seems I’ve probably got some reading to do. -
@[email protected] @[email protected] well, no, they’re not. This is much more of the “floating ideas around” stage so nothing is a hard requirement : )
I think, if there are contributions, they need as much protection against exploitation as possible. Both to tamp down the desire of distributed server operators to misuse the information, and against hostile authorities. -
@aud @ireneista i am very interested in making smaller and more powerful document search and filtering techniques. i don't think anything really "requires" surveillance at all
-
@hipsterelectron @aud @ireneista
If y'all can tolerate a white straight cis dude I would love to contribute to this conversation about information access, decentralization, shared curation, and resisting surveillance capital
-
@aud @ireneista I think it would be best to start with text only, as opposed to images, video, or audio. The reason is that the requirements for moderating media are much stricter than for text IIUC. (IANAL, of course.)
-
@trochee @hipsterelectron @aud no objection from us, at least!