SHA1 - Hashing Algorithm

Concepts

Products

Contacts

Home

Site Index

The Hashing algorithm we have used up to now is known as SHA1. (Secure Hashing Algorithm). For the formal definition, go here.

There are a number of alternatives. Most produce hashes of less than 20 characters and the risk of collision (two different documents or VRs producing the same hash) is significantly higher.

There is at least one other public domain algorithm which produces a hash of 20 characters, known as RIPEMD. Our choice between the two was more or less arbitrary in that when we went looking for the code, it became clear that SHA1 is much more widespread than RIPEMD.

More recently, however, the SHA algorithms have been expanded to include 256, 512 and 768 bit options. The benefits of the additional keyspace and reduced collision risk may be sufficient to outweigh the additional storage space required, particularly as storage is already cheap and getting cheaper. We may, therefore, "upgrade" to longer hashes as the system is developed.

Recently (Feb 2005) the world of cryptography experienced what, for it, was almost the equivalent of a Tsunami - when Chinese cryptographers announced a potential break to SHA1. For an enlightening discussion on the full implications of this breakthrough, you should read this page on Bruce Schneier's website.

Its relevance to the Codel project, however, is minimal. We use hashes for three purposes, none of which is seriously affected by the break.

Two of our purposes relate to integrity and one to security.

Counterfeit Prevention
A manufacturer creates a VR. We don't know the VR, we store only its hash value. When an end user wishes to validate their product they will input the VR (which is printed somewhere on the product) and it will hash to a value we hold on the database. No problem. IF the break permitted a counterfeiter to create their own VRs which happened to have matching hashes on our database, then - and only then - would it be a problem. As you will see if you read the linked page, this is emphatically NOT what has been achieved.

Database Security
The object of using the Hashes on our database is to ensure that we do not know the sensitive information represented by the hashes. In order to obtain that sensitive information, the attacker has to obtain the hash and "reverse engineer" it to obtain the initial VR (or document if its on the Copyright database, rather than the product tracking database). Again, if you read the discussion, this is emphatically NOT what has been achieved. Brute force attacks on a hash, with a view to recreating the original message, will still require 2160 attempts and remain, to all intents and purposes, computationally infeasible (for the time being)

Copyright
An author creates a document. Submits the hash to Codel to register their copyright date. An attacker wants to challenge the copyright - not to present an alternative document. End of problem! If its the same document, it must produce the same hash and the issue is about dates, not documents.

What is all the fuss about then?
What the break means is that if an attacker gets to choose BOTH messages (in our context this would mean that they create both the manufacturer's VR AND the counterfeiter's!) then what the Chinese have (almost) demonstrated is that it is technically feasible to find 2 VRs which produced the same hash values. They have not cracked the much more difficult problem of finding a second message to produce any arbitrary hash.

It should also be clear, if you read how our VRs are constructed that this construction places a further enormous hurdle in the way of the attacker. Not only do they have to create two VRs with matching hashes, but they also have to meet our construction rules.

As Schneier points out, in practical terms, this break does not immediately undermine all the security based on SHA1 but it sounds an alarm bell because "attacks always get better" and, as he's been warning since September 2004, there are potential problems with SHA which mean its time we created entirely new hashing algorithms. But again, as he points out, its "time to walk, not run, to the fire exits".

The important point about Schneier's analysis is that SHA itself is now in question. The attack that works for SHA1 should in principle work for all other SHA variants - although the reduction from 2128 to 2117 attempts still leaves a lot more breathing space than the reduction from 280 to 269. Nevertheless, ALL the SHA algorithms are now under suspicion so although we can climb to a higher part of the ship, it is still sinking - but very very slowly. So slowly, in fact, that it is clearly going to be capable of several more years of round the world cruising before it disappears beneath the waves.

With all that in mind, we will probably use SHA256 in place of SHA1 whenever that is an option. But we won't be too concerned about continued use of SHA1 for those uses where security is not the issue (such as copyright protection).