First off, I know hashes are 1 way. There only finite answers a hash can give though. Why can't we take a hash and convert it to an equivalent string that can be hashed back to the original hash?

eg:

string: "Hello World"
hashed: a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e 

unhash: "rtjwwm689phrw96kvo48rm64unc8oetb5kmrjiuh7h8huhi6dde5n5"
        (a real string that is equal to Hello world's hash)
hashed: a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e 
share|improve this question
25  
The short, but incredibly accurate answer, is that we can't convert hashes back to their input strings because an army of mathematicians has taken great efforts to make it difficult to do so using known means. – Cort Ammon yesterday
2  
@EJP I think the OP is asking about a preimage attack, and just doesn't know the term for it. – Cort Ammon 21 hours ago
2  
Lots of hash functions can be easily reversed. There's a specific type of hash function, called a cryptographic hash function, that can't be reversed because they've been carefully designed that way. – Mike Scott 19 hours ago
3  
Why can't we? We can. The algorithm is trivial: try every possible string, ordered by length and then by alphabetical order, until you get one that hashes to the desired value. Making that algorithm run in less than a trillion years is the hard part. There's a very large haystack to search for a very small needle. – Eric Lippert 6 hours ago
1  
@Mehrdad It also says 'why can't we reverse hashes?' But the title disagrees with the question, and the current version of the question as edited by a third party disagrees with the version I commented on, so what the OP is really asking is now anybody's guess. – EJP 5 hours ago

Take a simple mathematical operation like addition. Addition takes 2 inputs and produces 1 output (the sum of the two inputs). If you know the 2 inputs, the output is easy to calculate - and there's only one answer.

321 + 607 = 928

But if you only know the output, how do you know what the two inputs are?

928 = 119 + 809
928 = 680 + 248
928 = 1 + 927
...

Now you might think that it doesn't matter - if the two inputs sum to the correct value, then they must be correct. But no.

What happens in a real hash function is that hundreds of one-way operations take place sequentially and the results from earlier operations are used in later operations. So when you try to reverse it (and guess the two inputs in a later stage), the only way to tell if the numbers you are guessing are correct is to work all the way back through the hash algorithm.

If you start guessing numbers (in the later stages) wrong, you'll end up with an inconsistency in the earlier stages (like 2 + 2 = 53). And you can't solve it by trial and error, because there are simply too many combinations to guess (more than atoms in the known universe, etc)

In summary, hashing algorithms are specifically designed to perform lots of one-way operations in order to end up with a result that cannot be guessed backwards.

share|improve this answer
2  
I like this answer because it actually points at the properties of hashes which are used to make the "one way," which is what I think the OP was trying to get at. – Cort Ammon yesterday
2  
I think the question aims at "why can't we create a collision for a given hash" - regardless if we guess the right one or not (that might be the equivalent string - where the equivalence relation would be "has the same hash") – tylo 16 hours ago
4  
"you can't solve it by trial and error" -> "you can't realistically solve it by trial and error", "cannot be guessed backwards" -> "cannot realistically be guessed backwards", both of these given current technology. It's not that it's theoretically impossible to guess backwards (I mean, if you try every input, you will find a collision, period, even if it takes you 48 bazillion years to do it), it's that it's realistically impossible at the current time and, ideally, well into the future. – Jason C 15 hours ago
3  
@JasonC Given the level of the question, I was trying to explain the concept using plain English and a simple example - there's really no need to spell out the pedantic difference between theoretical and realistic outcomes. – adelphus 15 hours ago
    
@adelphus Sure, although in this case it was the point of the question. Also there's no need to defend anything, it's just extra info that I added to the answer via a comment. I have zero interest in how you phrase your answer and was not offering any sort of correction or debate. We're on the same team here. My comment is just... something in the universe that exists. – Jason C 3 hours ago

Cryptographically secure hashes were specifically build to (among other things) make what you're asking hard!

Now, you could try to create an appropriate dictionary of all hashes, hoping to find appropriate pairs... but it would take more storage space than the total storage space that's currently available on our planet and more computing power than you'll be able to get access to in this universe (at least, at the time of writing this) — which is why we call it "infeasable".

In your theoretical example, the collision would be the strings "Hello World" and "rtjwwm689phrw96kvo48rm64..." both producing the same hash a591a6d40bf420404a011733...

For SHA-2 and SHA-3, such pairs are not known up until today. If, such a (once cryptographically secure) hash would have to be considered as broken due to collisions.

For further reading, it might be interesting for you to dive into (for example) the "collision attacks" that broke MD5 and SHA-1.

share|improve this answer
2  
I do not thing that fact of known collision would make SHA-2 or SHA-3 obsolete. If such collision happens "accidentally" it would mean nothing. But if someone could intentionally generate such pair, then yes, it would be broken. – Łukasz Niemier yesterday
3  
@ŁukaszNiemier I didn't state it "obsoletes" a hash. I wrote would have to be considered as broken (as in "theoretical break"). See, if you happen to stumble over such a collision, it's called a "theoretical break" and if you can intentionally produce such pairs, it's called a " practical break". The first is a warning sign and what I pointed at in my answer, the later is practically a dead sentence. (Even when practically broken, it should be noted there are limited situations and very specific ways you could still use such a hash... but its initial functionality is rendered insecure.) – e-sushi yesterday
    
There's some records online of git encountering hash collisions in sha1 going back a year or so. – Joshua 8 hours ago
    

I'm taking a guess at where your confusion stems from.

The one-way-ness of hash functions does not relate to the mathematical property of being a not injective function.

A function $f$ that is injective will have different values $f(x), f(y)$ for all $x \neq y$. And indeed hash functions are usually non-injective (this can easily be derived from the fact that their domain is bigger than their codomain). But that is not the meaning of one-way.

Instead saying that a hash function is one-way specifically precludes the thing you want to do, which is to find a value $x$ such that $H(x) = y$ if you already have $y$. In other words given $x$ you can calculate $H(x)$ but going backwards is impossible. Hence, "one-way".

Of course the simple answer, as already given by e-sushi is: Because they are constructed so that it's impossible. :)

share|improve this answer

Strictly speaking, you can, and it stands to reason that you can.

A SHA-1 hash has 2^160 possible values. If we just consider 100 byte binary plaintexts, well, there are 2^800 possible ones of those. So it stands to reason that for any SHA-1 hash, there are likely to be around 2^640 100 byte binary plaintexts that would match it.

When two inputs have the same hash, it's called a hash collision. For non-secure hashes, it's not particularly difficult to find a collision. It's not even a design goal. For example, Java classes often have a hashCode() method, which generates a hash used to facilitate data structures like a HashMap. But these use algorithms designed to be cheap to run, which produce few accidental collisions. If you want to deliberately craft two objects with the same hashcode, it's usually easy.

Cryptographic hashes are designed, not to make collisions impossible, but to make them extremely difficult to find. That is, if your goal is to find an input that generates a given hash, there should be no way to do it that's faster than brute force -- trying every input in turn until one works.

The maths behind this are well documented -- find a book if you want to; this is not the place to explain it (nor would I be able to).

... and not every collision is useful. Consider a signed email message. There might be a lot of chunks of data that yield the same hash. But only a tiny subset of those look like text. And only a tiny subset of those look like English text. And probably only one of those looks like English text that the purported sender could plausibly have written.

So, you can find collisions using brute force, but brute force necessarily takes a long time, and that's what gives you security. The best cryptographic security is designed such that brute forcing would take longer than the age of the planet, on our fastest computers. But even weaker algorithms provide security -- if someone gets hold of my password hash, then spends 5 years finding a collision, well, never mind, I have changed my password by then.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.