crazygringo
18 hours ago
> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time... Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.
So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
In other words, you can only use UUIDv7 for rows that never need to be looked up by any data coming from the user. And maybe that exists sometimes for certain data in JOINs... but it seems like it might be more the exception than the rule, and you never know when an internal ID might need to become an external one in the future.
sgarland
3 minutes ago
Who are these "experts?" I'm a DBRE, and also very security conscious, and think this is an absurd what-if for most companies.
If it does matter for your application, then don't expose it - use an opaque id with something like AEAD, and expose that.
tracker1
18 hours ago
This is only really true if leaking the creation time of the record is itself a security concern.
donjoe
7 hours ago
To me, the most important question is: how do I scale v7 in an environment of 20+ engineers?
When using v7, I need some sort of audit that checks in every API contract for the usage of v7 and potential information leakage.
Detecting V7 uuids in the API contract would probably require me to enforce a special key name (uuidv7 & uuid for v4) for easier audit.
Engineers will get this wrong more than once - especially in a mixed team of Jr/sr.
Also, the API contracts will look a bit inconsistent: some resources will get addressed by v7, others by v4. On top, by using v4 on certain resources, I'd leak the information that those resources addressed by v4 will contain sensitive information.
By sticking to v4, I'd have the same identifier for all resources across the API. When needed, I can expose the creation timestamp in the response separately. Audit is much simpler since the fields state explicitly what they will contain.
sgarland
an hour ago
> Detecting V7 uuids in the API contract would probably require me to enforce a special key name (uuidv7 & uuid for v4) for easier audit.
Unless I'm missing something, check it on receipt, and reject it if it doesn't match. `uuid.replace("-", "")[12]` or `uuid >> 76 & 0xf`.
Regardless of difficulty, this comes down to priorities. Potential security concerns aside (I maintain this really does not matter nearly as much as people think for the majority of companies), it's whether or not you care about performance at scale. If your table is never going to get over a few million rows, it doesn't matter. If you're going to get into the hundreds of millions, it matters a great deal, especially if you're using them as PKs, and doubly so if you're using InnoDB.
parthdesai
2 hours ago
> By sticking to v4, I'd have the same identifier for all resources across the API. When needed, I can expose the creation timestamp in the response separately. Audit is much simpler since the fields state explicitly what they will contain
Good luck if you're operating at a decent scale, and need to worry about db maintenance/throughput. Ask the DBA at your company what they would prefer.
lazide
an hour ago
If you read the prior comment, this is now an ouroborus
AdieuToLogic
13 hours ago
>>> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs.
>> So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
> This is only really true if leaking the creation time of the record is itself a security concern.
No, as "leaking the creation time" is not a concern when API's return resources having properties representing creation/modification timestamps.
Where exposing predictable identifiers creates a security risk, such as exposing UUIDv7 or serial[0] types used as database primary keys, is it enables attackers to be able to synthesize identifiers which match arbitrary resources much quicker than when random identifiers are employed.
0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...
delifue
12 hours ago
With proper data permission check, having predictable ID is totally fine. And UUIDv7's random part is large enough so that it's much harder to predict than auto increment id.
If your security relies on attacker don't know your ID (you don't do proper data permission check), your security is flawed.
pinkgolem
10 hours ago
Is that not quit commen for invites/no user account shares?
javawizard
9 hours ago
Indeed, but one could easily argue that 128 bits of entropy aren't sufficient for a good invite token in the first place.
pinkgolem
9 hours ago
I am just puzzled why delifue calls something that, as far as I know is pretty standard across the industrie, bad practice
treve
8 hours ago
There's 2 cases being discussed. A UUIDv7 is a bad secret, but it's fine for many other ids. If I can guess your user id, it shouldn't really matter because your business logic should prevent me from doing anything with that information. If I can guess your password reset token it's a different story because I don't need anything else beyond that token to do damage.
nesarkvechnep
8 hours ago
Because it is?
skrebbel
an hour ago
No?
MikeNotThePope
9 hours ago
Exactly. I wrote about that a few days ago.
Primary keys using UUID v7 are (potentially) an HR violation.
https://mikenotthepope.com/primary-keys-using-uuid-v7-are-po...
beaker52
4 hours ago
Which part is in violation of the age discrimination laws here, the fact that k-sortable uuids divulge the information, or the fact someone is using them to discriminate against a candidate?
If it’s the latter (which, reading wikipedias summary suggests it is), then the entire premise that k-sortable uuids are a “HR violation” is bunk.
The problem with arguing about timestamps leaking this kind of information is that _anything_ can leak this kind of vaguely dated information.
- Seen on a website that ceased to exist after 2010? Gotchya!
- Indexed by Waybackmachine? Gotchya!
- Used <different uuid scheme> for records created before 2022? Gotchya!
The only way to prevent divulging temporal clues about an entity is to never reveal its existence in any kind of correlatable way (which, as far as I’m prepared to think right now, seems to defeat the point of revealing it to a UI at all).
cuu508
4 hours ago
What's the scenario here?
I submit my application in 2025 and get rejected.
20 years later I submit another application to the same company, using my existing 20 years old user profile, and now get rejected because somebody figures out I'm old by looking at my user id?
da_chicken
2 hours ago
Are there really any performances benefits of UUIDv7 over UUIDv4 that should ever come up in the context of an HR system? Just how many job applicants are you tracking?
I don't understand why you considered UUIDv7 in the first place.
kvirani
18 hours ago
Which I have to assume is rare, right?
wongarsu
17 hours ago
We used to leak approximate creation time all the time back when everyone used sequential keys. If anything sequential keys are far worse: they leak the approximate number of records, make it easy to observe the rate at which new keys are created, and once you know that you can deduce the approximate creation date of any key.
UUIDv4 removes all three of those vectors. UUIDv7 still removes two of three. It doesn't leak record count or the rate at which you create them, only creation time. And you still can't guess adjacent keys. It's a pretty narrow information leakage for something you routinely reveal on purpose.
johnisgood
5 hours ago
I often see sequential order IDs, and they get incremented by one, so I can guesstimate the amount of orders they get within a minute by creating my own orders. I watched this happen as I was intentionally removing and creating new orders (as they did not support modification of existing but not yet accepted ones). What may I do with this information though as an user that would be damaging? Legitimate question, intent is not harm, but I genuinely do not see how this is a bad thing.
I can see it being bad for tracking IDs, but not order IDs, unless you are allowed to view any orders that do not belong to your account, which is just fundamentally bad security and using UUIDv4 or a random string would simply be obscuring security.
hinkley
9 hours ago
It’s also industrial espionage on competitors or potential acquisitions.
teddyh
3 hours ago
Or wartime intelligence: <https://en.wikipedia.org/wiki/German_tank_problem>
blackenedgem
17 hours ago
UUIDv7s are much worse for creation time though imo. For sequential IDs an attacker needs to be have a lot of data to narrow the creation time. That raises the barrier of entry considerably to the point that only a committed attacker could infer the time.
With UUIDv7 the creation time is always leaked without any sampling. A casual attacker could quite easily lookup the time and become motivated in probing and linking the account further
AdieuToLogic
13 hours ago
> For sequential IDs an attacker needs to be have a lot of data to narrow the creation time.
When sequential integer ID's are externalized, an attacker does not need creation times to perform predictive attacks. All they need to do is apply deltas to known identifiers.
wredcoll
18 hours ago
It seems wildly paranoid, even for securitt researchers.
ibejoeb
17 hours ago
There are some practical applications that are not necessarily related to security. If you are storing something like a medical record, you don't want use it as a public ID for a patient visit, because the date is subject to HIPAA.
ownagefool
5 hours ago
This is probably not really true.
You wouldn't be publishing patient visits publically, the only folks that'd legitimatly see that record would be those which access to that visit, and they'd most likely need to know the time of said visit. This access should be controlled via AuthN, AuthZ and audited.
You'd also generally do a lot of time-based lookups on this data; what visits do I have today, this week, and so on. You might also want an additional DateTime field for timezones and offsets, but the v7 is probably better than v4 for this usecase.
mulmen
15 hours ago
But they would have to relate that ID to patient data like their identity right? The date alone cannot be a HIPAA issue. That means every date is a HIPAA violation because people go to the doctor every day.
oulipo2
17 hours ago
I remember in the cracking days, where we were trying to crack ElGamal encryption or other, we noticed when some code had been written in eg Delphi (which used a weak RNG based on datetime), then when you tried to guess when the code was compiled and the key were generated, you could get a rough timerange, and if you bruteforced through that timerange as a seed to the RNG, and tried to generate the random ElGamal key from that, you would widely reduce the range of possibilities (eg bruteforce 10M ints, instead of billions or more)
noir_lord
16 hours ago
An online casino got hit a similar way a long time ago, iirc someone realised the seed for a known prng was the system clock, so you could brute force every shuffle either side of the approx time stamp and compare the results to some known cards (I.e. the ones you’d been dealt) once you had a match you knew what everyone else had.
Always thought that was elegant (the attach not using the time as the seed).
hinkley
9 hours ago
I stopped airplane maintenance software from shipping with a particularly egregious form of this for SSL session key generation. It’s hard to get a good random seed on a real time operating system. I tell you hwut.
replygirl
17 hours ago
it's not about the individual record, it's about correlating records. if you can sequence everything in time it gets a lot easier to deanonymize data
Macha
17 hours ago
However, if your API has a (very common) createdAt field on these objects, the ability to get the creation time from the identifier is rather academic.
inopinatus
9 hours ago
The concern is not limited to access of the full records. The concern extends to any incidental expression of identifiers, especially those sent via insecure side channels such as SMS or email.
In most cases this forms a compliance matter rather than an open attack vector, but it nevertheless remains that one has to answer any question along the lines "did you minimise the privacy surface?" in the negative, or at least, with a caveat.
hinkley
9 hours ago
And that’s why some people are rabid about “no SELECT *”.
tracker1
17 hours ago
Can you provide an example of where you would legitimately have the ID for a medical record interaction, but not a date/time associated?
tyre
17 hours ago
Email is not secure but sending an email with a link to "Information about your appointment" is fine. If that link goes to `/appointments/sjdhfaskfhjaksdjf`, there is no leaked data. If it goes to `/appointments/20251017lkafjdslfjalsdkjfa`, then the link itself contains PHI.
Whether creation date is PHI…I could see the argument being yes, since it correlates to medical information (when someone sought treatment, which could be when symptoms present.)
lazide
an hour ago
Notably, this is an absurd argument. Every system I’ve dealt with right now sends the date/time/location/practitioner clear text in the email (or some variant thereof).
The only thing that seems to be protected is ‘reason for appointment’, and not all systems do that.
Everyone signs paperwork to authorize this when they first engage with the medical providers!
ensignavenger
11 hours ago
Email may not be secure, but neither are faces and phones, and yet medical professionals use those all the time.
ensignavenger
an hour ago
Fat fingered fax... faxes, not faces!
Too
6 hours ago
Your comment here has id 45622189 and the UI tells me in plain sight that you posted it 11h ago. Assuming the ids are sequential, these two combined tells me more about HN vs a uuid ”leaking” something that’s already expected to be public.
rat9988
4 hours ago
Maybe, but what's your point?
oconnor663
14 hours ago
It's relatively common for it to be a privacy concern. Imagine if I'm making an online payment or something, and one of the IDs involved tells you exactly when I created my bank account. That's a decent proxy for my age.
love2read
13 hours ago
1) I would argue that the year that you created your bank account is not a good proxy for age. 2) I would question where you think the uuid representing your age from your bak would leak to considering it’s still a bank account id 3) I would question whether you consider that the vast majority of uuids aren’t used for high stakes ids such as online banking ids
paulddraper
13 hours ago
A bank account number (assuming that is what are talking about, not some token) is already very sensitive information. Like, legal status protected information.
Knowing approximate age is a relatively small leak compared to that.
zie
12 hours ago
bank account numbers are printed on every check you ever wrote. Most people don't write checks anymore, though online bill pay sends physical checks still sometimes. They never really were sensitive information.
Bank security does not depend on your bank account being private information. Pretty much all bank security rounds to the bank having a magic undo button, so they can undo any bad transactions after it comes to light that it was a bad transaction. Sure they do some filtering on the front-end now to eliminate the need to use the magic undo button, but that's just extra icing to keep the undo button's use to a dull roar.
dethos
17 hours ago
Exactly
nitwit005
15 hours ago
It was a concern in the past, as people used password creation tools that were deterministic based on the current time.
There was previously an article linked here about recovering access to some bitcoin by feeding all possible timestamps in a date range to the password creation tool they used, and trying all of those passwords.
matthew16550
17 hours ago
Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places [1].
A UUIDv7 primary key seems to reduce / eliminate those problems.
If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.
[1] https://www.cybertec-postgresql.com/en/unexpected-downsides-...
AdieuToLogic
13 hours ago
> Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places.
Very true, as detailed by the link you kindly provided. Which is why a technique I have found useful is to have both an internal `id` PK `serial`[0] column (never externalized to other processes) and another column with a unique constraint having a UUIDv4 value, such as `external_id`, explicitly for providing identifiers to out-of-process collaborators.
0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...
crazygringo
16 hours ago
> I suspect it would not be used as often as the primary key index
That doesn't matter because it's the creation of the index entry that matters, not how often it's used for lookup. The lookup cost is the same anyways.
matthew16550
16 hours ago
The page I linked shows uses after creation where the cost can be different.
oconnore
17 hours ago
If this is a concern, pass your UUIDv7 ID through an ECB block cipher with a 0 IV. 128 bit UUID, 128 bit AES block. Easy, near zero overhead way to scramble and unscramble IDs as they go in/out of your application.
There is no need to put the privacy preserving ID in a database index when you can calculate the mapping on the fly
10000truths
16 hours ago
This is, strictly speaking, an improvement, but not by much. You can't change the cipher key because your downstream users are already relying on the old-key-scrambled IDs, and you lose all the benefits of scrambling as soon as the key is leaked. You could tag your IDs with a "key version" to change the key for newly generated IDs, but then that "key version" itself constitutes an information leak of sorts.
DSingularity
16 hours ago
Why do you need forward secrecy?
10000truths
15 hours ago
I edited that out of my post, as I'm not sure it's the correct term to use, but the problem remains. If the key leaks, then all IDs scrambled with that key can be de-scrambled, and you're back to square one.
blackenedgem
17 hours ago
Then that's just worse and more complicated than storing a 64 bit bigint + 128 UUIDv4. Your salt (AES block) is larger than a bigint. Unless you're talking about a fixed value for the AES (is that a thing) but then that's peppering which is security through obfuscation.
cyberax
17 hours ago
Uhh... What? You just use AES with a fixed key and IV in block mode.
You put in 128 bits, you get out 128 bits. The encryption is strong, so the clients won't be able to infer anything from it, and your backend can still get all the advantages of sequential IDs.
You also can future-proof yourself by reserving a few bits from the UUID for the version number (using cycle-walking).
grapesodaaaaa
12 hours ago
I still feel like calling something like uuid.v4() is easier and less cognitively complex.
cyberax
12 hours ago
There are advantages in monotonically increasing UUIDs, they work better with BTrees and relational databases.
macote
17 hours ago
You don't need to add a UUIDv4 column, you could just encrypt your UUIDv7 with format-preserving encryption (FPE).
whattheheckheck
17 hours ago
What's the computational complexity of doing that conversion vs the lookup table of uuidv4 for each uuidv7?
benjiro
16 hours ago
DB lookups + extra index are way more expensive then hardware assisted decoding.
If your UUIDv4 is cached, your still suffering from extra storage and index. Not a issue on a million row system but imagine a billion, 10 billion.
And what if its not cached. Great, now your hitting the disk.
Computers do not suffering from lacking CPU performance, especially when you can deploy CPU instruction sets. Hell, you do not even need encryption. How about making a simple bit shift where you include a simple lookup identifier. Black box sure, and not great if leaked but you have other things to worry about if your actual shift pattern is leaked. Use extra byte or two for iding the pattern.
Obfuscating your IDs is easy. No need for full encryption.
sagarm
9 hours ago
Hardware assisted is a red herring here. As you noted the real problem is that random reads have poor data locality, which degrades your database performance in a way that is expensive to resolve.
jandrewrogers
14 hours ago
Why would it be computationally complex? The encryption is implemented in the silicon, it is close to free for all practical purposes. The lookup table would be wildly more expensive in almost all cases due to comparatively poor memory locality.
gigatexal
17 hours ago
In a well normalized setup idk maybe not. Uuidv4 for your external ids and then have a mapping table to correspond that to something you’d use internally. Then you can torch an exposed uuid update the mapping table and generate a new one and none of your pointers and foreign keys need to change internally.
crazygringo
16 hours ago
The point is, that mapping table incurs the same indexing cost that was trying to be eliminated in the first place. Normalization is irrelevant.
Quekid5
10 hours ago
I wonder if there is a name for such a mapping table in RDBMS-land...?
gigatexal
5 hours ago
We call them lookup or mapping tables.
tekne
11 hours ago
Question: why not use UUIDv7 but encrypt the user-facing ID you hand out? Then it's just a quick decrypt-on-lookup, and you have the added bonus of e.g. being able to give different users different IDs
lukebechtel
17 hours ago
how risky is exposing creation time really though? I feel like for most applications this is uncritical
Biganon
16 hours ago
I wouldn't say necessarily "risky", it's more that it forces your hand when you wouldn't want to reveal an entity's creation time. Say you use these IDs for users of your site, and they're used in API queries / URLs etc., then it's trivial to know when a user created their account. Sure, many sites already expose this information, but not all of them do; what if you don't want it exposed? What if you consider that a user's seniority is nobody's business, that it could bias the behavior of other users towards them, etc.?
morshu9001
16 hours ago
It takes consideration. There are plenty of systems like Facebook and Twitter that use IDs somewhat exposing time, but the things they're IDing already have public creation timestamps.
sverhagen
12 hours ago
When you see v7 vs. V4, you'd expect the higher number to be better, hopefully better in all aspects, I wouldn't have expected such a thoughtful consideration to be required before upgrading. UUID-b would've been a better name then ;)
jpalawaga
9 hours ago
that is pretty common with uuid. for example in many cases you'll still want a plain uuid4 instead of e.g.uuid 5. maybe you want 5. it's usecase dependent.
for a specification such as uuid, there is not much to improve upon--just rearranging the bytes and their meanings.
djantje
6 hours ago
DB multi-master, or the DB not being responsible for primary key generation, is the use case, I think.
And then having uuidv7 as primary and foreign keys, can give you a performance gain.
saaspirant
14 hours ago
I am using it in a table where sorting by id (primary key) should also sort it by created time (newer records should have "bigger" id).
The id would be exposed to users. An integer would expose the number of records in it.
Am I using right guys?
ownagefool
6 hours ago
Meh.
You probably shouldn't / don't need to use v7 for your Users table because the age of your User probably has limted to no bearing on the look up patterns. For example, our Steam and Amazon accounts are pretty old, but we likely still use them.
However, your Orders table is significantly more likely to be looked up based on time, so a v7 makes a lot of sense here.
Now I'd argue the security implications are overblown, but in general tems you might also allow someone to look up a user, i.e. you can view my Steam profile, or maybe my Amazon wishlist. You probably don't need to be looking up another Users Order.
Alternativly, if your building an Enterprise Risk Solution, you could take a view that you don't want people knowing how old the risk is, but most solutions would show you some history and would believe that to be pertinent information.
There will be instances of getting it wrong, but it isn't actually _that_ complicated.
Illniyar
16 hours ago
If leaking creation time is a concern, can we not just fake the timestamp? We can do so in a way that most performance benefits remain - so like starting with a base time of 1970 and then adding base time to it intermittently, having random months and days to new records (or maybe based on the user's id - so the user's record are temporally consistent but they aren't with other user records).
I'm sure there might be a middle ground where most of the performance gains remain but the deanonymizing risk is greatly reduced.
Edit: encrypting the value in transit seems a simpler solution really
hu3
16 hours ago
In that case, auto increments can also be bumped from time to time. And start from a billion.
They're more performant than uuidv7. Why would I still use UIID? Perhaps I would still want uuids because they can be generated in client and because they make incorrect JOINs return no rows.
jongjong
15 hours ago
Great point. Also, having to support multiple IDs is a maintenance headache.
IMO, a major problem solved by UUIDs is the ability to create IDs on the client-side, hence, they are inherently user-facing. A major reason why this is an important use case for UUIDs is because it allows clients to avoid accidental duplication of records when an insertion fails due to network issues. It provides insertion idempotence.
For example, when the user clicks on a button on a form to insert a record into a database, the client can generate the UUID on the client-side, then attach it to a JSON object, then send the object to the server for insertion; in the meantime, if there is a network issue and it's unclear whether or not the record was inserted, the code can automatically retry (or user can manually retry) and there is no risk of duplication of data if you use the same UUID.
This is impossible to do with auto-incrementing IDs because those are generated by the database in a centralized way so the user cannot know the ID head of time and thus, if there is a network failure while submitting a form, the client cannot automatically know whether or not the record was successfully inserted; if they retry, they may create a duplicate record in the database. There is no way to make the operation idempotent without relying on some kind of fixed ID which has a uniqueness constraint on the database side.
tonyhart7
4 hours ago
Yeah, just use uuidv4 and another "ULID" if thats the case
which is pointless