mnls
13 days ago
UPDATE: Block-level deduplication reveals metadata leakage. I did another test that reveals something more concerning than just data retention. I took the original 100MB random file and modified a single byte in the middle: printf '\x01' | dd of=randomfile.dat bs=1 seek=52428800 count=1 conv=notrunc. This changes 1 byte out of 104,857,600 bytes (0.0000009% of the file). I then re-uploaded it to iCloud. It uploaded instantly again!
Apple isn't hashing complete files—they're doing block-level deduplication on encrypted data. They likely split files into chunks (probably 4MB or 16MB blocks, similar to Dropbox) and hash each block independently. When I changed 1 byte in the middle of the file, only the block containing that byte needed to be uploaded. The other 95+ blocks were already on Apple's servers and were deduplicated.
This means Apple's servers maintain an index of which specific encrypted blocks each user possesses, even though they can't decrypt the content. Even with end-to-end encryption, the server knows the "fingerprint" of every 4-16MB chunk of your data. Research has shown that block-level deduplication enables "deduplication attacks" where you can determine if a user has a specific file without breaking encryption by uploading a known file and see if it deduplicates → user has that file and this works even with E2EE because block patterns are observable server-side.
Well-known files (popular software, movies, documents) have predictable block signatures. Even encrypted, these patterns could potentially be identified. "Does user X have file Y?" becomes answerable through deduplication probing without actually decrypting anything.
I'm not claiming Apple is actively exploiting this or that the encryption is broken. The crypto is probably solid. But users aren't informed that block-level metadata is retained and that this metadata can leak information about content despite E2EE. "Permanent deletion" doesn't remove these block fingerprints.
I still plan to complete the 30-day retention test to see if Apple ever purges deleted blocks, but the block-level deduplication revelation suggests they keep this metadata indefinitely for system efficiency. For truly private storage, encryption alone isn't enough—you need encryption that prevents deduplication metadata from forming in the first place.