Archival is something that should be taken very very seriously by social media software
-
link rot! It's a thing. What if a server could easily turn all its content into a static archive? What if a fellow server could then trivially replace all existing links to the formerly-active-server to the archived backup?
-
Asta [AMP]replied to Asta [AMP] last edited by [email protected]
I think we need to take archival much more seriously in the "age of LLM pollution". It's pretty obvious that it's not considered profitable, despite that it is of immense value to society.
-
There's no LLM that can defend against LLM pollution, no matter how hard they try and sell those detection services. There is no substitute for the human brain. I think it would be a lot easier to avoid it if we had rigorous technical measures to ensure archives and backups and formed trusted networks with each other.
-
@aud we agree with both of these
we also want to note that, implementation-wise, deletion is hard. you need a bunch of non-standard infrastructure to even have a chance of it being "real".
-
(I'm getting the impression that while Mastodon lets you delete accounts, it also lets you undelete them, which suggests that it isn't real deletion. Anyone know if that's true? For instance: I wish I had the capability to ensure anything I ever wrote on tech.lgbt was deleted and that I was scrubbed from their database).
(not that I could necessarily guarantee that even if Mastodon did allow true account deletion that an admin had to run the version that truly did that, but). -
@[email protected] hmmmm. Is this difficulty due to the nature of the problem itself (IE, even a 'clean room' implementation would need to solve inherently difficult problems?) or simply that existing software isn't designed with this in mind? I admit that while I think it's an important goal, I obviously have yet to consider the difficulty it might bring.
-
@[email protected] I assume there's lot of things that would need to be adjusted in the database to handle the now missing content.
There's the whole "tombstone" thing in ActivityPub that can basically act as a stub but. -
@aud it's a thing that could be simple if every implementor and vendor up and down the stack wanted it to be. the problem is, it's not really anyone's top priority. in particular, a lot of techniques that worked on spinning-metal disks do not work on SSDs, and manufacturers often straight up lie about what deletion-related functionality their drives implement and the extent to which it does anything
-
@aud but also filesystems aren't really designed from that perspective, either, and the hard-deletion features that exist at that level aren't really hooked up to the hard-deletion features at lower levels
-
@[email protected] ah! Right, including down to the hardware level. I admit I was thinking just of the software side. Yeah... I hadn't considered that. It's unfortunately probably outside the scope of an open source project given the wide variety of hardware it would be running on. boooooooo.
Even a 'rewrite' of inputting junk data into the database over the old entries then erasing doesn't guarantee anything, I strongly suspect. -
@aud you would think that, but on the other hand there's Eugen.
-
@aud the military standard here says to shred the drive to dust whose particle size is smaller than the size of a single transistor
we think that is probably unnecessary. there is probably no realistic attack by state-level actors that would require a particle size that small to defeat.
-
@aud An actual delete does remove your content from the DB: https://github.com/mastodon/mastodon/blob/e1b7382ea6b8b944a363914490d6476726dd7075/app/services/delete_account_service.rb and optionally keeps your username reserved (which I think the UI doesn't even give you the option to nuke the username).
Doesn't prevent an admin from keeping backups or whatnot of course.
-
@aud and yes, overwrites aren't real since SSDs came along, because the drive is permitted to remap blocks to other blocks any time it likes. also, it often does so without even telling the higher levels about it.
-
@[email protected] I like the thing I read recently (about encrypting DMs in the fediverse and how difficult it actually is) and one thing is they split out security and privacy from each other, and I feel that's a good mental framework for thinking about these. It is indeed very difficult to securely remove a person's data; however, attempts to maintain their privacy through deletion are still worthwhile.
Still, I think security + privacy is obviously preferable. But jesus, that sounds like a nightmare in the making. -
@[email protected] hahahahaaaaaaaaa yeah. I like that @[email protected] calls him "website boy" lmao. It's pretty accurate.
-
@aud you mentioned tombstones. tombstones are an important strategy in general, and we have definitely specced things out in the course of our privacy work that rely on them. they can be quite helpful in scenarios where there's a need to redact things but the metadata that the thing existed should be kept. of course, they require the specific application-layer code to be aware of how deletion works and have specific support for it.
-
@[email protected] Ah, thank you. That's good to know.
-
@[email protected] but admittedly, from a science perspective, that sounds like a pretty cool fucking problem to work on (how to extract data from a pile of dust).
-
@[email protected] ugh, this makes me wonder if the site I found talking about undelete was just total made up bullshit. God, I hate the new internet.