The technique of finding an item from a collection of items is a common operation in everyday programming. But, the technique must be efficient, reliable, and of course simple to implement. The quality of efficiency is not an easy meter to obtain even though the problem may seem trivial. Therefore, to solve this vexing question, a number of techniques have been devised. These techniques are appropriate under most circumstances. This means there is no general formula for efficiency—only adequate ones—but they can be an excellent tool at the hand of programmer given the knowledge. Hashing, the keynote of this article, is one such technique. Java has direct support of this in the form of hashing classe s in the API library. The writeup detours into the techniques of hashing before delving into the Java Hashing Classes in brief.
The idea of hashing begins with the use of a key (the hash code ) of an item, such as the Social Security number of an employee, bank account number of a bank customer, or the batch number of a product to determine the exact location of the item in the table (the hash table ). The chosen key at its pre-existence, however, does not always have to be a number. It may be a string/word, date, and so forth, but they are always converted into a number finally by a method (the hash function ) that provides the key.
Suppose we are given a list of items, a specific item to be searched for, and the instruction to insert the item in an appropriate location if it is not found. Normally, what we do is search the list items one by one (sequential search), and retrieve it if found. Otherwise, we insert it into the next empty location. This may be good when there is no other option or the list is harmlessly small. The technique itself is effective, no doubt, but rudimentary; it is far from being an efficient one. However, if the items are arranged in a certain order (ascending/descending), we can apply a very efficient searching technique, called a binary search. But then, insertion is difficult because maintaining the order becomes a big overhead. For example, consider a list containing the following numbers: 1,6,9,20,45,67. To insert 19 in the list, we need to move 20,45,67 one place over. (Another technique is to insert at the end of the list and sort the list again, but still this is an inefficient idea). Now, imagine if 0 comes, we need to move all the items; worst case, imagine if the list contains thousands or more items! Can we deliver efficiently? Searching may be fine, but how efficient would the overall performance be if the operation also includes insertion and deletion? Clearly, we need some better ideas.
With hashing techniques, a data structure, called a hash table , is used where keys are mapped to an array location by a hash function. The keys to the array location are basically array indices. An element stored in the hash table is directly mapped by the hash function. The hash function generates the key with the help of a mathematical formula. This formula, when applied to a key, produces an integer.