Computer Science Stack Exchange is a question and answer site for students, researchers and practitioners of computer science. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have heard the word "hash" being used in different contexts (all within the world of computing) with different meanings. For example, in the book Learn Python the Hard Way, on the chapter of dictionaries it is said "Python calls them "dicts." Other languages call them "hashes."" So, are hashes dictionaries.

The other common usage of the word is in relation to encryption. I have also heard (& read) people using the word "hash" as a specific function within high-level programing.

So, what exactly is it?

Can anyone (with time and who is knowledgeable) kindly explain the nitty-gritties of "hash (or hashes)?"

share|cite|improve this question
2  
Wikipedia has detailed articles on hash tables and cryptographic hash functions. What are you looking for that isn't in those? – David Richerby 5 hours ago

"Hash" is really a broad term with different formal meanings in different contexts, so there's not a single perfect answer to your question. But I'll try. You'll get more familiar with it throughout your career as you see more examples.

Generally "hash" in CS refers to a function $f$, such as a "hash function", that takes as input objects and outputs a string or number. The input objects are usually structured things we know and love such as strings, integers, or bigger "objects". The output is usually some useful, small, but relatively unique string or number for that input. The noun "hash" often refers to this output. The verb "hash" often means "apply a hash function".

Generally this hash function $f$ has some common properties. One is a sort of randomness or "spreading-out". If I want to hash objects down into 100 buckets (so the output of my hash function is a number from 0-99), then I am usually hoping that about 1/100 objects land in bucket 0, about 1/100 land in bucket 1, and so on. Sometimes this is taken even farther, for instance, in cryptography I may want it to be computationally difficult to find two different inputs that map to the same output. Another is compression. I often want to hash arbitrarily-large inputs down into a constant-size output or fixed number of buckets. A final one is determinism. If you use the same hash function on the same object twice, you want to get the same answer. This may seem to conflict with randomness above, but the solution is to sometimes choose your hash function randomly, but then re-use it over and over.

One common application is in data structures such as a hash table, which are a way to implement dictionaries. Here, you allocate some memory, say, 100 "buckets"; then, when asked to store an (key, value) pair in the dictionary, you hash the key into a number 0-99, and store the pair in the corresponding bucket in memory. The, when you are asked to look up a key, you hash the key into a number 0-99 with the same hash function and check that bucket to see if that key is in there. If so, you return its value.

Note that you could also implement dictionaries in other ways, such as with a binary search tree (if your objects are comparable).

Another practical application is checksums, which are ways to check that two files are the same (for example, the file was not corrupted from its previous version). Because hash functions are very unlikely to map two inputs to the same output, you compute and store a hash of the first file, usually represented as a string. This hash is very small, maybe only a few dozen ascii characters. Then, when you get the second file, you hash that and check that the output is the same. If so, almost certainly it is the exact same file byte-for-byte.

share|cite|improve this answer
    
Thank you so much! – gracedlamb 5 hours ago
    
Nice explanation but I disagree with "very unlikely". See: programmers.stackexchange.com/questions/49550/… : collision do occur, and sometimes surprisingly often. – Olivier Dulac 2 hours ago
    
Also note that in the context of cyptography, the term "hash" very strongly implies a "one-way" operation that cannot be easily reversed in practice. When it can be easily reversed, it's called "encryption". This is why the people on Security.SE will tell you to always hash your customers' passwords, never to encrypt them. – Ixrec 2 hours ago
    
A hash that doesn't "spread out" is still a hash, just perhaps not a very good one for your application. – OrangeDog 1 hour ago

A hash function is broadly any surjective (one-way) function where the image is smaller than the domain. The output of such a function f(x) can be referred to as "the hash of x".

In computer science we typically encounter two applications of hash functions.

The first is for data structures such as hash tables, where we want to map the key domain (e.g. 32-bit integers or arbitrary-length strings) to an array index (e.g. integer between 0 and 100). The goal here is to maximise the performance of the data structure; properties of the hash function that are typically desirable are simplicity and uniform output distribution.

The second is for cryptography: message authentication, password/signature verification, etc. The domain is typically arbitrary byte strings. Here we are concerned with security - which sometimes means deliberately low performance - where useful properties are collision and pre-image resistance.

share|cite|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.