What Is Hash Code?

If you want to file something away for later retrieval, it can be faster if you file it numerically rather than by a long alphabetic key. A hashCode is a way of computing a small (32-bit) digest numeric key from a long String or even an arbitrary clump of bytes. The numeric key itself is meaningless and the hashCode functions for computing them can look a bit insane. However, when you go to look for something, you can do the same digest calculation on the long alphabetic key you are looking for, and no matter how bizarre an algorithm you used, you will calculate the same hashCode, and will be able to look up numerically with it. Of course there is always the possibility two different Strings will have the same digest hashCode. However, even then, all is not lost; it greatly narrows down the search, hence speeding it up. A Hashtable goes a step further, scrunching down the hashCode even further to an even smaller number that it can use to directly index an array, usually by dividing it by some (ideally prime) number and taking the remainder.

I saw this introduction in one of the site that I was surfing. The main idea of hashing was exactly this. In Java 1.0.x and 1.1, String.hashCode function was working by sampling every nth character. However, this was slowing down the Hashtable lookup. With Java 1.2, function has been improved to multiply the result by 31 then add the next character in sequence. This was much slower, but much secure to avoid the collisions. For Object.hashCode things were almost same. Then Sun came up with a much wider spec which 1.4 implementation made sense after.

Here is the spec for that:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

As compared to the general contract specified by the equals method, thehashCode simply states two important requirements that must be met while implementing the hashCode method.

  1. Consistency during same execution – Firstly, it states that the hash code returned by the hashCode method must be consistently the same for multiple invocations during the same execution of the application as long as the object is not modified to affect the equals method.
  2. Hash Code & Equals relationship – The second requirement of the contract is thehashCode counterpart of the requirement specified by the equals method. It simply emphasizes the same relationship – equal objects must produce the same hash code. However, the third point elaborates that unequal objects need not produce distinct hash codes.

Calculating Aggregate hashCodes with XOR

XOR is one of the most famous method to calculate hashes. To calculate the hash of two fields, the basic idea is calculating each fields hashCode separately and then XORing them with ^ operator. If the field is array or list, you could XOR all the values together. However, there are some pros and cons:

Pros

  • It does not depend on order of computation.
  • It does not “waste” bits. If you change even one bit in one of the components, the final value will change.
  • It is quick, a single cycle on even the most primitive computer.
  • It preserves uniform distribution. If the two pieces you combine are uniformly distributed so will the combination be. In other words, it does not tend to collapse the range of the digest into a narrower band.

Cons

  • It treats identical pairs of components as if they were not there.
  • It ignores order. A ^ B is the same a B ^ A. If order matters, XOR would not be suitable to compute the hashCode for a List.
  • If the values to be combined are only 8 bits wide, there will be effectively only 8 bits in the XORed result. In contrast, multiplying by a prime and adding would scramble the entire 32 bits of the result, giving a much broader range of possible values, hence greater chance that each hashCode will unique to a single Object. In other words, it tends to expand the range of the digest into the widest possible band (32 bits).

Okay, that’s enough technical information. Let’s get to the code. For the first example, let’s assume that we have two string fields in our object. Because we don’t have any id field, these strings should be our identities to generatehashCode. To do that, we would pick two prime numbers (in this case it would be 17 and 37):

 

/**
* Hash code that combines two strings.
*
* @return a hash code value on the pair of strings.
*/
public int hashCode() {
int result = 17;

result = 37 * result + string1.hashCode();
result = 37 * result + string2.hashCode();
// etc for all fields in the object

return result;
}

And giving String.hashCode would be also useful. This is just a rough impelementation, but it reflects how it works:

 

/**
* Simplified version of how String.hashCode works.
* For the actual version see src.jar java/lang/String.java file.
*
* @return a hash code value for this String.
*/
public int hashCode() {
// inside the String is a char[] that holds the characters.
int hash = 0;
int len = char.length();

for ( int i=0; i<len; i++) {
hash = 31 * hash + val[ i ];
}

return hash;
}

Here is another one. This is much faster and more broader, you can apply this one to bytes, short, chars, ints, arrays etc..

 

/**
* Java version of the assembler hashCode used in the
* BBL Forth compiler.
*
* @return a hash code value for the byte[] in this object.
*/
public int hashCode() {
//  byte[] holds the data.
int hash = 0;
int len = val.length();

for ( int i=0; i<len; i++ ) {
// rotate left and xor
// (very fast in assembler, a bit clumsy in Java)
hash <<= 1;

if ( hash < 0 ) {
hash |= 1;
}

hash ^= val[i];
}

return hash;
}

That’s it for now till next time folks. I hope this article would be useful.

Source : http://isagoksu.com/2009/development/java/what-is-hash-code/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s