LB: UUIDv7 is time ordered, sortable, and has good key locality. Just use those, and you won't even miss your autoincrement keys

Jenniferplusplus

jell

@jenniferplusplus omg, it's literally "v4 but with all the stuff you wish v4 had."

Jenniferplusplus

@joshuaelliott you do give up like 40 bits of entropy. But that still leaves you with more than 60 bits of entropy, and it turns out that's plenty for virtually every scenario.

smallcircles (Humanity Now 🕊)

@jenniferplusplus @joshuaelliott

I got confused in all the variations, but there was a recent article giving a good summary, and then discussed on HN: https://news.ycombinator.com/item?id=41350225

I got to v7 as well from that list.

Benjamin Sonntag-King 🐙

@jenniferplusplus is it though?
I read (don't remember where) that v7 being ordered makes it (a bit) harder for B-Trees in sgbd tables...

Jenniferplusplus

@smallcircles @joshuaelliott
Basically, v4 is enormous entropy, completely unordered, collisions statistically cannot happen within the lifetime of the universe, use when you have an absolutely enormous number of records (trillions+) in the same name space.

v7 is high entropy and time ordered, collisions are mathematically impossible after 1 millisecond. Use when records close together in time are likely to be queried together, or if you need keys that have a stable meaningful ordering.

Jenniferplusplus

@smallcircles @joshuaelliott the rest are mostly not very useful

smallcircles (Humanity Now 🕊)

@jenniferplusplus @joshuaelliott

useful advice, thanks!

Jenniferplusplus

@vincib being closer in value means they should cluster into fewer larger buckets, so you do less tree traversals and you can get them in sequential memory pages more often. I'm not sure how being ordered affects things, but my understanding is the access characteristics due to the narrower distribution makes a big difference

Jesse Cooke

@jenniferplusplus this is where I was landing and then I saw https://github.com/paralleldrive/cuid2 a few days ago. I'd be curious to hear your thoughts.

Jenniferplusplus

@jc00ke I've never seen it before. It seems like it's solving problems I don't have, and it's not clear to me why anyone would have these problems.

Jesse Cooke

@jenniferplusplus good point. The biggest plus for me in switching to a sortable ID is what I've previously read about db indexes, but you're right, even with UUIDv4 I've not had a real problem with index size.

Hrefna (DHC)

@jenniferplusplus

It really shines when looking at *sets* of data and are likely to be drawing an entire set of data at once (e.g., a timeline).

The negative side is when you are dealing with a potential for hotspotting, say because you have several orders of magnitude difference in number of entries between one timestamp and another.

(We deal with this kind of key design constraint all of the time when working in spanner or bigtable or equivalent datastores—it's about tradeoffs)

@vincib

Hrefna (DHC)

@jenniferplusplus

Basically: if you want and will mostly use random access, it isn't the correct solution but it also isn't correct to use a b-tree (hash index is your best bet). If you think it is likely that you are going to be drawing closely temporally related things at once, then there's some nuance around the _grain_ but in general you want an ordered key (possibly not a millisecond-resolution ordered key, but an ordered key nonetheless)

If you want a coarser grain, that's easy

@vincib