Funny story no one will believe, but it’s true. A good friend of mine joined a startup as CTO 10 years ago, high growth phase, maybe 200 devs… In his first week he discovered the company had a microservice for generating new UUIDs. One endpoint with its own dedicated team of 3 engineers …including a database guy (the plot thickens). Other teams were instructed to call this service every time they needed a new ‘safe’ UUID. My pal asked wtf. It turned out this service had its own DB to store every previously issued UUID. Requests were handled as follows: it would generate a UUID, then ‘validate’ it by checking its own database to ensure the newly generated UUID didn’t match any previously generated UUIDs, then insert it, then return it to the client. Peace of mind I guess. The team had its own kanban board and sprints.
I've seen similar, buried deep within a major SV tech co.
Their process was a bit more complex because the master list of in-use UUIDs was stored in an external CMDB service run by a different department. They got a daily dump of that db, so were able to check that when generating a "provisional" id. Only once it had been properly submitted to the CMDB did it became "confirmed".
They had guardrails in place to prevent "provisional" ids being used in production, and a process for recycling unused "confirmed" ids. Oh, and they did regular audits which were taken very seriously by management.
Last I heard, they were 18 months into a 6 month project to move their local database cache to Zookeeper...
At some point someone optimizes the system to a global company-wide incrementing 128 bit counter. Instead of needing a costly database lookup against a growing database the microservice just fetches the current counter, increments it by one and hands out the new value. Easy, fast O(1) operation.
This even allows you to shard the service to provide high availability and distribute the service globally to reduce latency. Just give each instance a dedicated id range it can hand out. I'd suggest reserving some of the high bits to indicate data center id, and a couple more bits for id-generator instance within that dc.
Wait a second, this starts to look familiar ... does Twitter still do that, or did they eventually switch?
Hashing microservice deployment was blocked by random generator microservice stuck in Pending because it needed an UUID from UUID microservice which was blocked by hashing.
For a single database using UUIDs, yes, it's astronomically rare. But it's quite a different thing to say that no computer system on Earth has ever experienced a UUID collision. The number of systems out there is also astronomical.
I always thought generating UUIDs at random was insane. I now only use LLMs. The prompt is: "generate a UUID. Make sure no one ever used it anywhere in their code or database. Check your work and think hard about each step. Do not output any reasoning or plain English, only th UUID itself".
I'm using 16b55183-1697-496e-bc8a-854eb9aae0f3 and probably some more too.
I suppose if we all post our list here, then we can all check for duplicates?
We should all send our already-generated UUIDs to a shared database, we could just put it on Supabase with a shared username/password posted on HN, so we can all ensure that after generating a UUIDv4 locally, it's not used by anyone else. If it's in the database, we know it's taken.
It's a super simple mechanism, check in common worldwide UUID database, if not in there, you can use it. Perhaps if we use a START TRANSACTION, we could ensure it's not taken as we insert. But that's all easy, I'll ask Claude to wire it up, no problem.
According to the many-worlds interpretation of quantum mechanics, there's bound to be one branch of universe where every UUID is the same. Can you imagine what those guys are thinking?
The only guesses I'm having is that we originally generated UUIDv4s on a user's phone before sending it to the database, and the UUID generated this morning that collided was created on an Ubuntu server.
I don't fully know how UUIDv4s are generated and what (if anything) about the machine it's being generated on is part of the algorithm, but that's really the only change I can think of, that it used to generated on-device by users, and for many months now, has moved to being generated on server.
If it's UUIDv4 and you validate that the UUID is valid and not conflicting I don't really see the issue with user-generated UUIDs. Being able to generate unique keys in an uncoordinated manner is the main selling point of UUIDs
Sure, it's something I'd flag in any design to spend two minutes to talk about potential security implications. But usually there aren't any
user-generated (as in: on the user's phone) was only at the very early stages of this product, and we've since moved to on-server. It's a cash-register type of app, where the same invoice must not be stored twice. So we used to generate a fresh invoice_id (uuidv4) on the user's device for each new invoice, and a double-send of that would automatically be flagged server-side (same id twice). This has since moved on to a server-only mechanism.
The database flagged it simply by having a UNIQUE key on the invoice_id column. First entry was from 2025, second entry from today.
If it was two on-device generated UUIDs I could see a collision happening. There have been instances of cheap end devices not properly seeding their random number generators, leading to colliding "random" values. And cases of libraries using cheap RNGs instead of a proper cryptographic RNG, making it even worse
But on a server that shouldn't happen, especially not in 2026 (in the past, seeding the rngs of VMs used to be a bit of an issue). Even if one UUID was badly generated, a truly random UUID statistically shouldn't collide with it. You'd need an issue in both generators
The UUIDv4 collision is statistically extremely unlikely. What is more likely is both systems used the same seed. This might be just a handful of bytes, increasing the chance of collision to one in billions or even millions.
I've always looked at it the the other way - being that lucky would mean you have even less chance of something else lucky happening, good time to save your money
The lottery ticket part makes no sense. Statistically if such an improbable event just happened to him, then chance of it happening again should be even more improbable.
Although incredibly rare, it's not impossible so probably best to just plan for collisions. A simply retry should suffice. But I agree I feel like something is going on somewhere else ...
Reminds me of some code I saw running in production. Every time we added a new entry, we were pulling all the UUIDs from this table, generating a new UUID, and checking for collisions up to 10 times.
Would the UUID v7 be more collision proof? Hard to say because it takes time into account but then the number of entropy bits are reduced hence the UUID generated exactly at the same time have more chance of a collusion because number of entropy bits are a much smaller space hence could result in collusions more easily.
Just a stupid question, but why not append the date, even in seconds as hex. It's just a few bytes and would guarantee that everything OK now will be OK in the future?
You can just use a different UUID variant which includes timestamp data instead (e.g. v1 or v7), there are also variants which include the MAC address.
yeah, any sort of additional semi-random data could've helped prevent this, I'm sure. That, however, is also kind of the idea of UUIDv4, it has lots of randomness and time built in already.
This is why I prefer to use a random base32 string over UUID. At least you get a proper 128 bit entropy instead of just a 122 bit entropy as with UUIDv4. That's a 64x difference in collision probability. I always thought UUIDs were a toy, not for serious use. If you control the strings, you can even make a longer ID.
Also, numerous applications that use a unique ID per record frequently need to check for ID collisions. I know I do for a short URL generator.
Not GP, but: not confident. How confident would I be to avoid a (slightly lower entropy) UUID collision while also avoiding a clock desync landing on the exact same logged millisecond? Very, which is how confident I was about not encountering an UUID collision before this thread, so very++ I guess.
The chance of a UUIDv4 collision is very low, but it is never zero.
If everything is done properly, then this is very likely the one and only time anyone involved in the telling or reading of this account will ever experience this.
Funny story no one will believe, but it’s true. A good friend of mine joined a startup as CTO 10 years ago, high growth phase, maybe 200 devs… In his first week he discovered the company had a microservice for generating new UUIDs. One endpoint with its own dedicated team of 3 engineers …including a database guy (the plot thickens). Other teams were instructed to call this service every time they needed a new ‘safe’ UUID. My pal asked wtf. It turned out this service had its own DB to store every previously issued UUID. Requests were handled as follows: it would generate a UUID, then ‘validate’ it by checking its own database to ensure the newly generated UUID didn’t match any previously generated UUIDs, then insert it, then return it to the client. Peace of mind I guess. The team had its own kanban board and sprints.
I've seen similar, buried deep within a major SV tech co.
Their process was a bit more complex because the master list of in-use UUIDs was stored in an external CMDB service run by a different department. They got a daily dump of that db, so were able to check that when generating a "provisional" id. Only once it had been properly submitted to the CMDB did it became "confirmed".
They had guardrails in place to prevent "provisional" ids being used in production, and a process for recycling unused "confirmed" ids. Oh, and they did regular audits which were taken very seriously by management.
Last I heard, they were 18 months into a 6 month project to move their local database cache to Zookeeper...
At some point someone optimizes the system to a global company-wide incrementing 128 bit counter. Instead of needing a costly database lookup against a growing database the microservice just fetches the current counter, increments it by one and hands out the new value. Easy, fast O(1) operation.
This even allows you to shard the service to provide high availability and distribute the service globally to reduce latency. Just give each instance a dedicated id range it can hand out. I'd suggest reserving some of the high bits to indicate data center id, and a couple more bits for id-generator instance within that dc.
Wait a second, this starts to look familiar ... does Twitter still do that, or did they eventually switch?
Who has the balls to form that team? Were they disbanded?
Pffft - they didn't need to store the whole UUID, just a hash. Dummies.
They thought of that, but they were still working on hiring a team to maintain the hashing microservice.
Hashing microservice deployment was blocked by random generator microservice stuck in Pending because it needed an UUID from UUID microservice which was blocked by hashing.
already laughing from parent comment this is well done
What you're talking about is so extremely rare that it's much more likely that the entire Earth is destroyed by an asteroid right this inst...
For a single database using UUIDs, yes, it's astronomically rare. But it's quite a different thing to say that no computer system on Earth has ever experienced a UUID collision. The number of systems out there is also astronomical.
About as rare as an asteroid typing an ellipsis and clicking the add comment button.
Well it would be statistically even rarer for that UUID collision to happen and the earth to be destroyed by an asteroid.
Please, do not use b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd, I checked my database and I was using it already.
I always thought generating UUIDs at random was insane. I now only use LLMs. The prompt is: "generate a UUID. Make sure no one ever used it anywhere in their code or database. Check your work and think hard about each step. Do not output any reasoning or plain English, only th UUID itself".
You're welcome.
Actually asking ChatGPT this query led it giving me this UUID "550e8400-e29b-41d4-a716-446655440000" which happens to be a very common example UUID
I knew it, we're all getting the same cheap UUIDs and the good ones are reserved for the big dogs.
uuid.uuidv4() recently switched to "adaptive entropy" instead of "xmax entropy" in an effort to save costs on non-premium users.
I'm using 16b55183-1697-496e-bc8a-854eb9aae0f3 and probably some more too. I suppose if we all post our list here, then we can all check for duplicates?
You can check https://everyuuid.com/ for collisions.
We should all send our already-generated UUIDs to a shared database, we could just put it on Supabase with a shared username/password posted on HN, so we can all ensure that after generating a UUIDv4 locally, it's not used by anyone else. If it's in the database, we know it's taken.
It's a super simple mechanism, check in common worldwide UUID database, if not in there, you can use it. Perhaps if we use a START TRANSACTION, we could ensure it's not taken as we insert. But that's all easy, I'll ask Claude to wire it up, no problem.
But then I will claim I have already used all the UUIDs in my spreadsheets, and my lawyer will send cease&desist letters to every database.
A site previously posted here could be useful: https://everyuuid.com/
That UUID should have my name sticker on it. Don't your UUIDs have name stickers?
Something off on how the RNG is initialized? Lack of entropy?
If the rng is not customized it will use:
getRandomValues doesn't specify a minimum amount of entropy.It's a near certainty that something is badly wrong with the RNG, and, yes, probably in how it's seeded.
It's probably messing up the cryptography, too.
According to the many-worlds interpretation of quantum mechanics, there's bound to be one branch of universe where every UUID is the same. Can you imagine what those guys are thinking?
This is why I am not a fan of the Everett approach
Gotta be a seeding issue. If it's not, and you can prove it, you're about to be a little famous probably :P
History repeats itself. https://halupedia.com/the-great-uuid-fumble-of-73
I fully agree. It makes no sense. Yet...
The only guesses I'm having is that we originally generated UUIDv4s on a user's phone before sending it to the database, and the UUID generated this morning that collided was created on an Ubuntu server.
I don't fully know how UUIDv4s are generated and what (if anything) about the machine it's being generated on is part of the algorithm, but that's really the only change I can think of, that it used to generated on-device by users, and for many months now, has moved to being generated on server.
You let users generate a UUID?
To be honest, the chance that you are doing something weird is probably higher than you experiencing a real UUID conflict.
How did your database 'flag' that conflict?
If it's UUIDv4 and you validate that the UUID is valid and not conflicting I don't really see the issue with user-generated UUIDs. Being able to generate unique keys in an uncoordinated manner is the main selling point of UUIDs
Sure, it's something I'd flag in any design to spend two minutes to talk about potential security implications. But usually there aren't any
Validation etc. every thing which should not be controlled by a user, will not be controlled by a user.
user-generated (as in: on the user's phone) was only at the very early stages of this product, and we've since moved to on-server. It's a cash-register type of app, where the same invoice must not be stored twice. So we used to generate a fresh invoice_id (uuidv4) on the user's device for each new invoice, and a double-send of that would automatically be flagged server-side (same id twice). This has since moved on to a server-only mechanism.
The database flagged it simply by having a UNIQUE key on the invoice_id column. First entry was from 2025, second entry from today.
If it was two on-device generated UUIDs I could see a collision happening. There have been instances of cheap end devices not properly seeding their random number generators, leading to colliding "random" values. And cases of libraries using cheap RNGs instead of a proper cryptographic RNG, making it even worse
But on a server that shouldn't happen, especially not in 2026 (in the past, seeding the rngs of VMs used to be a bit of an issue). Even if one UUID was badly generated, a truly random UUID statistically shouldn't collide with it. You'd need an issue in both generators
The UUIDv4 collision is statistically extremely unlikely. What is more likely is both systems used the same seed. This might be just a handful of bytes, increasing the chance of collision to one in billions or even millions.
Better check what crypto.js is actually doing in your exact setup. Weak polyfills exist...
Poorly seeded prng.
most likely the culprit indeed
But I used nonstandard nonces!
It's not happening by chance, there is a bug somewhere.
From what I skimmed the package should just call to the js runtime's crypto.randomUUID(). I think it should always be properly seeded.
I think it is extremely unlikely that the runtime has a bug here, but who knows? What js runtime do you use?
> I thought this is technically impossible
Actually it's not impossible, but very very improbable.
P.S. You should play a lottery/powerball ticket
P.P.S. Whenever I use the word improbable, the https://hitchhikers.fandom.com/wiki/Infinite_Improbability_D... comes in mind
> P.S. You should play a lottery/powerball ticket
Actually, they should not. That collision and winning the lottery would be even rarer.
Inconceivable!
Buy some lava lamps
1 in 4.72 × 10²⁸
1 in 47.3 octillion.
i'd be suspecting a race condition or some other naive mistake, otherwise id be stocking up on lottery tickets.
(lol at the other user posting at the same time about the lottery ticket.. great minds and all that.)
I've always looked at it the the other way - being that lucky would mean you have even less chance of something else lucky happening, good time to save your money
The lottery ticket part makes no sense. Statistically if such an improbable event just happened to him, then chance of it happening again should be even more improbable.
Although incredibly rare, it's not impossible so probably best to just plan for collisions. A simply retry should suffice. But I agree I feel like something is going on somewhere else ...
Reminds me of some code I saw running in production. Every time we added a new entry, we were pulling all the UUIDs from this table, generating a new UUID, and checking for collisions up to 10 times.
Would the UUID v7 be more collision proof? Hard to say because it takes time into account but then the number of entropy bits are reduced hence the UUID generated exactly at the same time have more chance of a collusion because number of entropy bits are a much smaller space hence could result in collusions more easily.
Thoughts?
You open up every millisecond a new block. Should be even more unlikely
Just a stupid question, but why not append the date, even in seconds as hex. It's just a few bytes and would guarantee that everything OK now will be OK in the future?
You can just use a different UUID variant which includes timestamp data instead (e.g. v1 or v7), there are also variants which include the MAC address.
yeah, any sort of additional semi-random data could've helped prevent this, I'm sure. That, however, is also kind of the idea of UUIDv4, it has lots of randomness and time built in already.
UUID v4 consists of only random bits, no timestamp info.
oh, interesting, I didn't know that and this could possibly be part of the problem perhaps depending on what's used as the seed.
But surely hashing the date still allows for a future collision. Leaving the date as is means it will never collide after that one second has passed.
> but why not append the date
And use uuid v5 to hash it :)
This is why I prefer to use a random base32 string over UUID. At least you get a proper 128 bit entropy instead of just a 122 bit entropy as with UUIDv4. That's a 64x difference in collision probability. I always thought UUIDs were a toy, not for serious use. If you control the strings, you can even make a longer ID.
Also, numerous applications that use a unique ID per record frequently need to check for ID collisions. I know I do for a short URL generator.
Why not to have timestamp-uuid instead ?
How confident are you that your machines clocks are in perfect sync? What about the risk of clock drift + correction, or hardware issues?
Not GP, but: not confident. How confident would I be to avoid a (slightly lower entropy) UUID collision while also avoiding a clock desync landing on the exact same logged millisecond? Very, which is how confident I was about not encountering an UUID collision before this thread, so very++ I guess.
The chance of a UUIDv4 collision is very low, but it is never zero.
If everything is done properly, then this is very likely the one and only time anyone involved in the telling or reading of this account will ever experience this.
Classic gamblers fallacy!
> We're using this: https://www.npmjs.com/package/uuid
Why? There's a built-in for this.
https://nodejs.org/api/crypto.html#cryptorandomuuidoptions
Buy a lottery ticket