Chapter 12: Hash Functions – The Joy of Cryptography

Introduction to Hash Functions

A hash function is any function that maps arbitrary-length input to fixed-length output, denoted as H: {0,1} → {0,1}ⁿ. Think of H(m) as a "fingerprint" of m. While different messages ideally would have unique fingerprints, this isn't possible—there are infinitely many messages but only 2ⁿ possible outputs of H*.

Key Properties

Collisions Exist: By the pigeonhole principle, collisions (x ≠ x' with H(x) = H(x')) must exist.
Computational Hardness: A hash function H is collision-resistant if no polynomial-time algorithm can find a collision in H.

12.1: Defining Security

Formal Definition

A hash function H is collision-resistant if no polynomial-time algorithm can output a collision under H. However, this definition is impossible to achieve because collisions always exist mathematically.

Solution: Hash Function Families

To address this, we introduce hash function families—a set ℋ of functions where each H ∈ ℋ is chosen randomly. This random choice ensures collisions are hard to find.

Security Libraries

Two libraries model collision resistance:

ℒℋcr-real: Outputs H(x).
ℒℋcr-fake: Outputs H(x) but self-destructs if a collision is detected.

These libraries are indistinguishable if collisions are hard to find.

Variants of Collision Resistance

Target Collision Resistance: Given H and H(x), finding x' with H(x) = H(x') is hard.
Second Preimage Resistance: Given H and x, finding x' ≠ x with H(x) = H(x') is hard.

Both are weaker than plain collision resistance.

12.2: Hash-Then-MAC

A secure MAC can be constructed by hashing the message first:

Construction 12.2: Hash-Then-MAC

Given:

A MAC scheme M with message space T = {0,1}ⁿ.
A hash family ℋ with output length n.

Define the MAC scheme:

KeyGen: Output k ← M.KeyGen.
MAC: Output t ← M.MAC(k, H(m)).
Verify: Output M.Verify(k, H(m), t).

Security Claim

If ℋ is collision-resistant and M is a secure MAC, then HtM is a secure MAC.

Proof Outline

Replace M with an idealized MAC.
Replace ℋ with an idealized collision-resistant hash.
Show that verification fails unless (m, t) was generated by GETMAC.

12.3: Merkle-Damgård Construction

A method to build hash functions from compression functions:

Construction 12.4: Merkle-Damgård

Given a compression function h: {0,1}ⁿ⁺ᵗ → {0,1}ⁿ, define MDʰ: {0,1}* → {0,1}ⁿ as:

Pad x to a multiple of t bits.
Append a block encoding the length of x.
Iteratively apply h with an initialization vector (IV).

Security Claim

If ℋ is collision-resistant, then MDℋ is collision-resistant.

Proof

Any collision in MDʰ can be traced back to a collision in h.

12.4: Length-Extension Attacks

A vulnerability in Merkle-Damgård hash functions:

Attack Scenario

Given H(k‖m), an attacker can predict H(k‖m‖z) for some z without knowing k.

Implications

MAC Vulnerability: If H(k‖m) is used as a MAC, an attacker can forge MACs for extended messages.

Exercises

Visual Hash Collisions: Generate two files with opposite meanings but agreeing MD5 hashes in first/last 16 bits.
Modified Merkle-Damgård: Analyze collisions in a flawed hash construction.
Hash Function Design: Critique a hash function that outputs x if x is n-bits.
Nested Hash Functions: Show collisions in H⁽ᵗ⁾ imply collisions in H.
Ordered Hash Collisions: Find collisions in a hash preserving dictionary order.
Linear Hash Collisions: Find collisions in a hash with H(x⊕y) = H(x)⊕H(y).
Generalized Merkle-Damgård: Extend the construction for arbitrary input lengths.
PRF with Hash: Prove F'((k,H), x) = F(k, H(x)) is a secure PRF.
Exotic Merkle-Damgård Attacks:
- Find 4 colliding messages with ∼2⋅2ⁿ/² calls.
- Construct 2ᵈ colliding messages with O(d⋅2ⁿ/²) work.
- Show H(x) = H₁(x)‖H₂(x) is insecure if H₁ is Merkle-Damgård.
Iterated Hash Attack: Break collision resistance of H iterating H* like CBC-MAC.

FAQs

Q: Why can't hash functions be injective?
A: Because they map infinite inputs to finite outputs, ensuring collisions exist by the pigeonhole principle.

Q: What's the purpose of the IV in Merkle-Damgård?
A: The IV ensures unpredictability and foils hard-coded collisions.

Q: How does length-extension attack work?
A: Knowing H(x) allows predicting H(x‖z) for some z, enabling MAC forgeries.

👉 Learn more about secure hash functions
👉 Explore advanced cryptographic constructions