Every time you run git diff, review a pull request, or compare two documents, a diff algorithm is computing the minimum set of changes to transform one text into another. This guide explains how the major diff algorithms work and when each produces the best results.
The Core Problem: Longest Common Subsequence
The foundation of all diff algorithms is finding the Longest Common Subsequence (LCS) between two sequences. The LCS is the longest sequence of elements that appear in both inputs in the same order (but not necessarily contiguously). Everything not in the LCS represents an addition or deletion.
Myers Diff Algorithm
Eugene Myers published his diff algorithm in 1986 and it remains the most widely used — it powers git diff, GNU diff, and most diff tools. Key properties:
- Finds the shortest edit script (minimum number of insertions + deletions)
- Time complexity: O(ND) where N is the sum of input lengths and D is the edit distance
- Fast for inputs with small differences (most real-world diffs)
- May produce unintuitive results when inputs are very different
Patience Diff
Patience diff (Bram Cohen, 2005) uses a different strategy: it first finds unique lines that appear exactly once in each input, matches those as anchors, then recurses between anchors. Benefits:
- Produces more intuitive diffs for code — function and class boundaries are naturally respected
- Better at handling moved blocks of code
- Slightly slower than Myers for large files
- Available in Git via
git diff --patience
Algorithm Comparison
| Algorithm | Speed | Output Quality (Code) | Output Quality (Prose) | Used By |
|---|---|---|---|---|
| Myers | Fast (O(ND)) | Good | Good | git diff (default), GNU diff |
| Patience | Moderate | Excellent | Good | git diff --patience |
| Histogram | Fast | Very good | Good | git diff --histogram (JGit) |
| Hunt-Szymanski | Fast for similar inputs | Good | Good | Some Unix diff implementations |
Diff Output Formats
- Unified diff: Shows context lines with +/- markers. The standard for patches and code review.
- Side-by-side: Old and new text displayed in parallel columns. Best for visual comparison.
- Inline (word-level): Changes highlighted within lines. Best for prose where changes are within sentences.
Practical Applications
- Version control (Git): Computing and displaying code changes between commits
- Code review: Highlighting changes in pull requests
- Document comparison: Legal, contract, and policy document review
- Merge conflict resolution: Three-way diff between common ancestor, yours, and theirs
- Plagiarism detection: Finding similarity between documents
Compare text instantly with the WizlyTools Text Diff Tool — side-by-side comparison with line-level and word-level highlighting.