According to the National Institute of Standards and Technology (NIST) , a hash algorithm takes binary data (message) and produces a condensed representation, called the message digest. Particularly, a hash function accepts a variable-length input and generates a unique fixed-size output (“compression property”).
The origin of the term “hash” has non-technical meaning. Hans Peter Luhn (1896 – 1964), a computer scientist for IBM, was the first that used the term “hash” to explain that hash functions “chop” the input domain into many sub-domains that get “mixed” into the output range to improve the uniformity of the key distribution. Hash functions are used for password protecting because the main purpose for this issue is to encrypt passwords in a form that is unfeasible to decrypt and also confirm if the given password is correct.
Hash function classification: Hash functions informally have the property of easy computation, i.e. given y and an input x, y(x) is easy to compute. That property implies an unkeyed hash function, but in general hash functions refer to both unkeyed and keyed hash algorithms. Two types of hash functions are the Modification Detection Codes (MDCs) and Message Authentication Codes (MACs). MDCs are a subclass of unkeyed hash functions and their purpose is to ensure the required by applications data integrity. MDCs are classified into one-way hash functions (OWHFs) and collision resistant hash functions (CRHFs). OWHFs are the functions that given an output (hashed input) is difficult to find the input (pre-specified hash value) and CRHFs are functions that given two inputs it’s difficult to find the same hash-value output. On the other hand, MACs purpose is to assure, without additional mechanisms, the source of a message and its integrity. MACs have two functionally parameters, a secret key and a message input and they belong to the class of keyed hash functions.
Cryptographic hash functions (CHFs) can be used to implement password hashing. We have to mention that although CHFs are hash functions, the term hash function per se doesn’t mean that every hash function is cryptographic as well. A cryptographic function is usable for security purposes because it offers a high degree of protection against unintentional and intentional modification. The cost for this purpose is that they are far more computationally intensive and thus slower than typically hash functions.
A cryptographic hash function must satisfy some properties depending on the application. CHFs are one way (“preimage resistance”) procedures, i.e. h is “preimage resistant” if given a hash value y it is computationally infeasible to find x such that h(x)=y. Moreover, CHFs have the requirement of “preimage resistance”, i.e. given the input x1 and its hash value h(x1), generating an input x2 with hash value h(x2)=h(x1) is computationally infeasible. Also, CHFs are “collision resistance”, i.e h is “collision resistant” if it is computationally infeasible to find any two inputs x1 and x2 such that h(x1)=h(x2). For example, if we hash the words “harrys” and “harris” the result is totally different:
hash(“harrys”) = 47682637ac38fe39739da284829b937482398ef384
hash(“harris”) = ab1738e02893829ef9312cb14c14ca452400030402
Now that we have an overview of hash functions let’s explain why we hash passwords and the properties that password hashing must have. First of all, password hashing is somehow necessary because we want to be more “secure” from attackers that maybe get access of our database. Specifically, by password hashing we are trying to make it as painful as possible for attackers to retrieve our passwords. Password hashing will not make our passwords impervious to attacks but it will perform “damage containment” in the event of a breach. Hashes have the advantages to generate quick and also are small to store. However, these advantages are also their weakness because it is simple after all to compare hashes of basic passwords with a table of precalculated hashes and then “dehash” all the passwords. Thus, password hashing requires to be fast on software, slow on hardware and have a unique salt per password.
The solution is to hash more than just the password. This procedure is called “salting”. Not all the passwords use “salt” but when they do it boosts the security of each password so it won’t crack. The “salt” is a random value (not duplicated between passwords) appending or prepending the password before hashing. We do this process in order two passwords don’t generate the same hash value. In order to check if the password is correct we need the “salt”, hence it is stored in addition to the hash or as part of the hash string itself so that the hashed password can be reconstructed correctly for comparison. “Salting” is mostly used to prevent precomputation attacks against the digested message. Precomputation-based attack method works only for small ranges of input values, so by “salting” our password we increase the size of it and the resulting precomputation table with possible hashes would be too large even for a small subset of passwords.