This is quite interesting. Common hashes is pretty cool. What about for SHA-1, c...

btilly · on Jan 29, 2012

There is a well-known principle called the pigeon-hole principle - you cannot put more pigeons than you have holes in holes without putting at least 2 in one.

The larger the number of bits you have, the larger the number of possible messages. Therefore if the size of the message exceeds the size of the hash, there are more possible messages than there are possible hashes they can be sent to. Therefore by the pigeonhole principle, some hash value represents more than one message, so there must be collisions.

That part is simple. The hard part is finding them. The simplest way is to just try lots of random messages, until 2 give the same hash value. This is called brute force. But there are enough possible hash values that this is not feasible.

To date we have not found an SHA-1 collision. However we know of algorithms that should be able to find one faster than brute force. But we have not actually found one yet.

However attacks only get better, and computers only get faster. It is widely accepted that an actual hash collision in SHA-1 is just a matter of time now.

dedward · on Jan 29, 2012

its not just widely accepted.... hashing algorithms, due to the pigeonhole principle you just explained, by definition are full of collisions.

the only security relevant part is how hard it is to find them for use in various scenarios......

(this is why i really dont get why zfs ever did deduplication relying on hash only.... sure verify is an option, but it would be insane to use hash only....... unlessi am overlooking aomething statistical that makes it make sense (maybethe odds of a collision are fR less than the odds of total hardwarefailure ?still.....)

caf · on Jan 30, 2012

Your last sentence hits the nail on the head. Even when storing petabytes of data, the odds on a freak hash collision are still many orders of magnitude longer than the odds on a hardware failure.

saurik · on Jan 30, 2012

There's a statement that has been "pretty much permanently" on a whiteboard-covered wall of the computer lab at my college telling a joke about "the difference between a mathematician and an engineer", that goes through the math behind a specific type of prime number generator, calculates the likelihood that it might fail, and then claims the mathematician cares about that while the engineer knows that is orders of magnitude less likely than a guaranteed algorithm failing due to a cosmic ray hitting it in RAM and flipping one of its bits. ;P

dedward · on Jan 30, 2012

Indeed... a quick search turns up someone from the zfs team indicating that a collision in zfs dedup (sha256) is 50 orders of magnitude less likely than an uncorrected hardware error. i shoukd have looked before posting.

btilly · on Jan 30, 2012

I meant, of course, that we'll find one.

We know the exist. We just don't have any examples to present.

mansr · on Jan 29, 2012

If the number of possible messages is greater than the number of possible hashes, collisions are trivially guaranteed to occur. Since hashes are typically used with messages longer than 192 bits, SHA-1 collisions are certainly a possibility. Constructing one is orders of magnitude harder than for MD5, though.