Password-protected ZIP archives are common means of compressing and sharing sets of files—from sensitive documents to malware samples to even malicious files (i.e. phishing “invoices” in emails).
But, did you know it is possible for an encrypted ZIP file to have two correct passwords, with both producing the same outcome when the ZIP is extracted?
A ZIP file with two passwords
Arseniy Sharoglazov, a cybersecurity researcher at Positive Technologies shared over the weekend a simple experiment where he produced a password-protected ZIP file called x.zip.
The password Sharoglazov picked for encrypting his ZIP was a pun on the 1987 hit that’s become a popular tech meme:
But the researcher demonstrated that when extracting x.zip using a completely different password, he received no error messages.
In fact, using the different password resulted in successful extraction of the ZIP, with original contents intact:
Like the researcher’s ZIP archive, ours was created with the aforementioned longer password, and with AES-256 encryption mode enabled.
While the ZIP was encrypted with the longer password, using either password extracted the archive successfully.
How’s this possible?
Responding to Sharoglazov’s demo, a curious reader, Rafa raised an important question, “How????”
Twitter user Unblvr seems to have figured out the mystery:
ZIP uses PBKDF2, which hashes the input if it’s too big. That hash (as raw bytes) becomes the actual password. Try to hash the first password with SHA1 and decode the hexdigest to ASCII… 🙂
— Unblvr (@Unblvr1) August 20, 2022
When producing password-protected ZIP archives with AES-256 mode enabled, the ZIP format uses the PBKDF2 algorithm and hashes the password provided by the user, if the password is too long. By too long, we mean longer than 64 bytes (characters), explains the researcher.
Instead of the user’s chosen password (in this case “Nev1r-G0nna-G2ve-…”) this newly calculated hash becomes the actual password to the file.
When the user attempts to extract the file, and enters a password that is longer than 64 bytes (“Nev1r-G0nna-G2ve-… “), the user’s input will once again be hashed by the ZIP application and compared against the correct password (which is now itself a hash). A match would lead to a successful file extraction.
The alternative password used in this example (“pkH8a0AqNbHcdw8GrmSp“) is in fact ASCII representation of the longer password’s SHA-1 hash.
SHA-1 checksum of “Nev1r-G0nna-G2ve-…” = 706b4838613041714e62486364773847726d5370.
This checksum when converted to ASCII produces: pkH8a0AqNbHcdw8GrmSp
Note, however, that when encrypting or decrypting a file, the hashing process only occurs if the length of the password is greater than 64 characters.
In other words, shorter passwords will not be hashed at either stage of compressing or decompressing the ZIP.
This is why when picking the long “Nev1r-G0nna-G2ve-… ” string as the password at the encryption stage, the actual password being set by the ZIP program is effectively the (SHA1) hash of this string.
At the decryption stage, if you were to enter “Nev1r-G0nna-G2ve-…,” it will be hashed and compared against the previously stored password (which is the SHA1 hash). However, entering the shorter “pkH8a0AqNbHcdw8GrmSp” password at the decryption stage will have the application directly compare this value to the stored password (which is, again the SHA1 hash).
The HMAC collisions subsection of PBKDF2 on Wikipedia provides some more technical insight to interested readers.
“PBKDF2 has an interesting property when using HMAC as its pseudo-random function. It is possible to trivially construct any number of different password pairs with collisions within each pair,” notes the entry.
“If a supplied password is longer than the block size of the underlying HMAC hash function, the password is first pre-hashed into a digest, and that digest is instead used as the password.”
But, the fact that there are now two possible passwords to the same ZIP does not represent a security vulnerability, “as one still must know the original password in order to generate the hash of the password,” the entry further explains.
Arriving at a perfect password
An interesting key aspect to note here is, ASCII representations of every SHA-1 hash need not be alphanumeric.
In other words, let’s assume we had chosen the following password for our ZIP file during this experiment. The password is longer than 64 bytes:
It’s SHA-1 checksum comes out to be: bd0b8c7ab2bf5934574474fb403e3c0a7e789b61
And the ASCII representation of this checksum looks like a gibberish set of bytes—not nearly elegant as the alternative password generated by the researcher for his experiment:
BleepingComputer asked Sharoglazov how was he able to pick a password whose SHA-1 checksum would be such that its ASCII representation yields a clean, alphanumeric string.
“That’s why hashcat was used,” the researcher tells BleepingComputer.
By using a slightly modified version of the open source password recovery tool, hashcat, the researcher generated variations of the “Never Gonna Give You Up…” string using alphanumeric characters until he arrived at a perfect password.
“I tested Nev0r, Nev1r, Nev2r and so on… And I found the password I need.”
And, that’s how Sharoglazov arrived at a password that roughly reads like “Never Gonna Give You Up…,” but the ASCII representation of its SHA-1 checksum is one neat alphanumeric string.
For most users, creating a password-protected ZIP file with a choice of their password should be sufficient and that is all they would need to know.
But should you decide to get adventurous, this experiment provides a peek into one of the many mysteries surrounding encrypted ZIPs, like having two passwords to your guarded secret.