Why is this coming back up again? It is well known that you can create collisions using MD5. However, you will have to try real hard to do so.
The implications of that are simple.
Do not use MD5 for any type of security or cryptography. On the other hand, if you're using MD5 for other purposes, you can continue to do so.
I frequently use MD5 to generate unique ids for a ton of stuff. There is little risk of a collision, since I'm not trying to make things collide. On the other hand, I would never use it for anything security related.
The question I care is: Is the probability of naturally occurring a collision in MD5 hashing significantly more than any other 128 bit hashing algorithm?
Sure, but if you're willing to accept lots collisions on "bad" input there are faster hashes (Bernstein and Jenkins have nice fast non-cryptographic hashes, for instance.)
Non-cryptographic hash functions may be a bad idea: e.g. don't store URI parameters (?foo=bar&baz=bar) in such a hash table, or you'll be vulnerable to rather simple DoS (this was all over the internet a week or two ago.)
> Non-cryptographic hash functions may be a bad idea: e.g. don't store URI parameters (?foo=bar&baz=bar) in such a hash table, or you'll be vulnerable to rather simple DoS (this was all over the internet a week or two ago.)
I think universal hashing is the usual protection against that kind of attack, and I think universal hashing is not considered cryptographic:
> I frequently use MD5 to generate unique ids for a ton of stuff.
Why not just use a random 64 or 128 bit number (or UUID)? This would be faster and would not require that the input (that you are hashing) already be unique.
The only legitimate use of MD5 I can think of is to verify legacy MD5 checksums.
Content-addressable storage concepts can be quite valuable. See git, for example.
Using MD5 for this is undeniably a bad habit, of course. Just use the output of an SHA-2 function, even if you have to truncate it to 128 bits. Anything to get people to stop using MD5.
> Because a bunch of academics managed to create a collision after 2 weeks of trying on a beefed up machine? Really?
WTF?! Are you out of your mind? MD5 was exploited to falsify a real-world RapidSSL CA certificate... in 2008. MD5's weaknesses haven't been "academic" in years.
People reach for what they know, and most do not have the capacity to determine the security of cryptographic algorithms in any particular context. Getting them to stop using it at all is the surest way to get them to stop using it where it's critical.
> MD5 is much faster than SHA-2.
Largely irrelevant in today's world. Single-threaded, my two year old laptop runs over 125MB/sec through SHA256, and around 100MB/sec through SHA512. SHA-2 is not a major bottleneck.
The implications of that are simple.
Do not use MD5 for any type of security or cryptography. On the other hand, if you're using MD5 for other purposes, you can continue to do so.
I frequently use MD5 to generate unique ids for a ton of stuff. There is little risk of a collision, since I'm not trying to make things collide. On the other hand, I would never use it for anything security related.