Introduction to Hash Functions
A hash function is any function that maps arbitrary-length input to fixed-length output, denoted as H: {0,1} → {0,1}ⁿ. Think of H(m) as a "fingerprint" of m. While different messages ideally would have unique fingerprints, this isn't possible—there are infinitely many messages but only 2ⁿ possible outputs of H*.
Key Properties
- Collisions Exist: By the pigeonhole principle, collisions (x ≠ x' with H(x) = H(x')) must exist.
- Computational Hardness: A hash function H is collision-resistant if no polynomial-time algorithm can find a collision in H.
12.1: Defining Security
Formal Definition
A hash function H is collision-resistant if no polynomial-time algorithm can output a collision under H. However, this definition is impossible to achieve because collisions always exist mathematically.
Solution: Hash Function Families
To address this, we introduce hash function families—a set ℋ of functions where each H ∈ ℋ is chosen randomly. This random choice ensures collisions are hard to find.
Security Libraries
Two libraries model collision resistance:
- ℒℋcr-real: Outputs H(x).
- ℒℋcr-fake: Outputs H(x) but self-destructs if a collision is detected.
These libraries are indistinguishable if collisions are hard to find.
Variants of Collision Resistance
- Target Collision Resistance: Given H and H(x), finding x' with H(x) = H(x') is hard.
- Second Preimage Resistance: Given H and x, finding x' ≠ x with H(x) = H(x') is hard.
Both are weaker than plain collision resistance.
12.2: Hash-Then-MAC
A secure MAC can be constructed by hashing the message first:
Construction 12.2: Hash-Then-MAC
Given:
- A MAC scheme M with message space T = {0,1}ⁿ.
- A hash family ℋ with output length n.
Define the MAC scheme:
- KeyGen: Output k ← M.KeyGen.
- MAC: Output t ← M.MAC(k, H(m)).
- Verify: Output M.Verify(k, H(m), t).
Security Claim
If ℋ is collision-resistant and M is a secure MAC, then HtM is a secure MAC.
Proof Outline
- Replace M with an idealized MAC.
- Replace ℋ with an idealized collision-resistant hash.
- Show that verification fails unless (m, t) was generated by GETMAC.
12.3: Merkle-Damgård Construction
A method to build hash functions from compression functions:
Construction 12.4: Merkle-Damgård
Given a compression function h: {0,1}ⁿ⁺ᵗ → {0,1}ⁿ, define MDʰ: {0,1}* → {0,1}ⁿ as:
- Pad x to a multiple of t bits.
- Append a block encoding the length of x.
- Iteratively apply h with an initialization vector (IV).
Security Claim
If ℋ is collision-resistant, then MDℋ is collision-resistant.
Proof
- Any collision in MDʰ can be traced back to a collision in h.
12.4: Length-Extension Attacks
A vulnerability in Merkle-Damgård hash functions:
Attack Scenario
Given H(k‖m), an attacker can predict H(k‖m‖z) for some z without knowing k.
Implications
- MAC Vulnerability: If H(k‖m) is used as a MAC, an attacker can forge MACs for extended messages.
Exercises
- Visual Hash Collisions: Generate two files with opposite meanings but agreeing MD5 hashes in first/last 16 bits.
- Modified Merkle-Damgård: Analyze collisions in a flawed hash construction.
- Hash Function Design: Critique a hash function that outputs x if x is n-bits.
- Nested Hash Functions: Show collisions in H⁽ᵗ⁾ imply collisions in H.
- Ordered Hash Collisions: Find collisions in a hash preserving dictionary order.
- Linear Hash Collisions: Find collisions in a hash with H(x⊕y) = H(x)⊕H(y).
- Generalized Merkle-Damgård: Extend the construction for arbitrary input lengths.
- PRF with Hash: Prove F'((k,H), x) = F(k, H(x)) is a secure PRF.
Exotic Merkle-Damgård Attacks:
- Find 4 colliding messages with ∼2⋅2ⁿ/² calls.
- Construct 2ᵈ colliding messages with O(d⋅2ⁿ/²) work.
- Show H(x) = H₁(x)‖H₂(x) is insecure if H₁ is Merkle-Damgård.
- Iterated Hash Attack: Break collision resistance of H iterating H* like CBC-MAC.
FAQs
Q: Why can't hash functions be injective?
A: Because they map infinite inputs to finite outputs, ensuring collisions exist by the pigeonhole principle.
Q: What's the purpose of the IV in Merkle-Damgård?
A: The IV ensures unpredictability and foils hard-coded collisions.
Q: How does length-extension attack work?
A: Knowing H(x) allows predicting H(x‖z) for some z, enabling MAC forgeries.
👉 Learn more about secure hash functions
👉 Explore advanced cryptographic constructions