I know the Internet Archive has been under a ton of infrastructural pressure lately, but anyone have any idea about how long they might take to review an application and get back to you?
-
Asta [AMP]replied to Asta [AMP] last edited by [email protected]
On that note, I think the need for a community owned, distributed search engine is higher than ever. Generative AI is going to continue filling Google, Bing, etc, with trash and it's pretty much a goal of any fascist or authoritarian regime to control information. One way you can do that is just totally sending the signal to noise ratio down to 0.
I believe we can use #activityPub and #activityStreams as a basis for a distributed search engine. Human curation of data, strict integrity measures, opt-in federation and auditing can help keep relevant, genuine information in and bad actors out. I really think we need to do this.
I am working on it and have various pieces in place but it is admittedly slow going since I am also working on other things.
#search #searchEngines #knowledgePreservation #tech -
@aud we honestly suspect search is the wrong lens here, though it may well have an important role. the real task is information discovery but it would be amazing to have an approach that doesn't flatten social structures.
-
I want to write up a document for this, but the basic requirements are:
1. Trivial deployment
2. Easy interface to curate, rank, and audit data. People should not be required to be sysadmins to do this stuff.
3. Opt-in opt-in opt-in
We cannot automate trust, but we can build tools to help us build it with each other. -
Asta [AMP]replied to Irenes (many) last edited by [email protected]
@[email protected] yes! I'm using 'search' in a very broad manner; information discovery and curation is precisely what I'm thinking of. To the extent that a webpage can act as a librarian, that's what we need.
-
@aud oh good
-
Asta [AMP]replied to Asta [AMP] last edited by [email protected]
@[email protected] To that end, I think what I've spent most of my time thinking about re: this is:
1. How do we ensure data integrity?
2. How do we make it easy for people to contribute, audit, curate?
3. How do we help build trust?
4. How can we distribute the computational and mental workload necessary to ensure knowledge is preserved and discoverable without corruption?
And then, finally, 5. search algorithms on said data.
(these aren't really listed in order of technical importance; for instance, point 5 absolutely will end up informing points 1 through 4. But, you know, just talking about what I think the goals should be). -
@aud we for sure have thoughts on all those, we'll try to get them together after this meeting
-
@[email protected] I've got some CV updating to do (well, making a separate CV for lab positions and then applying to them) today but I'm willing to bet your thoughts are slightly more formalized on this. Especially as I've largely been thinking about how the technical aspects of scaling out would work (I mean... makes sense, a lot of my experience is HPC).
-
Fuck it, gonna fix an issue on the GitHub, get myself noticed
-
@aud yeah do it
-
@[email protected] I don't think what they're asking for is particularly complex and we actually did stuff like this when I worked at Microsoft so?
But when I submit the PR we'll see -
@aud fix around and find out
-
maybe I'll try and fix the
docker compose
w/ podman issue that exists after this. -
https://github.com/internetarchive/openlibrary/pull/10015 welp, it's been a sec since I've tried to fix and test something in a large python code base. Hopefully I don't get laughed out of the room.
-
squeaky worm gets the sweet sweet grease motherfuckers
(I mean, potentially) -
@aud would be very difficult to get laughed out of the room by contributing code
-
@[email protected] fair enough! I also went into their gitter channel and was like " I put in an application for a job at the IA so I decided to also try and fix an issue while I was at it to draw some attention to myself (but also just to help out in general) "
subtle! -
@aud that is useful info for them to know!
-
@[email protected] right!? why not. Worst case scenario, I spent 30 minutes remembering how to do stuff in Python and some REST api stuff and contributed some code that either works as is or could be worked into something that fixes the problem in a way they need.
-
@aud so, belatedly... pieces like this make us wonder if the idea of searching the entire web is just over https://www.giantfreakinrobot.com/ent/independent-ends.html