Skip to content

Levenshtein distance

Type of Edit distance, which takes into account only insert, delete and substitute operations.

distLev(a,b)={a,if b=0b,if a=0distLev(tail(a),tail(b)),if head(a)=head(b)1+min{distLev(tail(a),b)distLev(a,tail(b))distLev(tail(a),tail(b))otherwisedist_{Lev}(a,b) = \begin{cases} |a|,& \text{if } |b| = 0\\ |b|,& \text{if } |a| = 0\\ dist_{Lev}(tail(a), tail(b)),& \text{if } head(a) = head(b)\\ 1 + min \begin{cases} dist_{Lev}(tail(a), b) \\ dist_{Lev}(a, tail(b)) \\ dist_{Lev}(tail(a), tail(b)) \end{cases} & \text{otherwise} \end{cases}

Weighted Levenshtein distance

Another variation is to add weight (or score) to edit operations:

  • We can modify wiegth of delete and insert, but they need to be the same to preserve symmetry property
  • We can modify weight of substitution, but it makes sense to choose values less than to 2 times weight of insert/delete. Otherwise algorithm can as well count it as 1 deletion, 1 insertion.

The more chances of replacing one letter with another, the less weight, for example:

  • if letters are close on keyboard
  • if letters sound similar, like s and z in organisation and organization.
  • if letters look similar, like letter o and 0 (zero). This is usefull for OCR

See: weighted-levenshtein.

Algorithms

AlgorithmTime complexitySpace complexity
Wagner-Fischer1O(mn)O(mn)O(mn)O(mn)
Space-improved version of Wagner-Fischer2O(mn)O(mn)O(m)O(m)
Masek-Paterson3O(nmax(1,m/log(n)))O(n \max( 1, m/\log(n)))

Vizualization

Aproximate algorithms

AlgorithmTime complexitySpace complexityApproximation
Andoni-Nosatzki: 4O(n1+ϵ),ϵ>0O(n^{1+\epsilon}), \epsilon>0
Ukkonen 5O(n+d2)O(n + d^2)
Myers 6 7O(n+d2)O(n + d^2)O(n)O(n)
BJK 8O(n3/7)O(n^{3/7})

Footnotes

  1. Wagner, Robert A., and Michael J. Fischer. “The string-to-string correction problem.” Journal of the ACM 21.1 (1974): 168-173

  2. string2string: A Modern Python Library for String-to-String Algorithms

  3. Masek, William Joseph and Mike Paterson. “A Faster Algorithm Computing String Edit Distances.” J. Comput. Syst. Sci. 20 (1980): 18-31.

  4. Andoni, Alexandr and Negev Shekel Nosatzki. “Edit Distance in Near-Linear Time: it’s a Constant Factor.” 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS) (2020): 990-1001.

  5. Ukkonen, Esko. “Algorithms for Approximate String Matching.” Inf. Control. 64 (1985): 100-118.

  6. Eugene W. Myers. 2023. An O(ND)O(ND) difference algorithm and its variations. Algorithmica 1, 1–4 (Nov 1986), 251–266. https://doi.org/10.1007/BF01840446

  7. Landau, Gad M. et al. “Incremental String Comparison.” SIAM J. Comput. 27 (1998): 557-582.

  8. Bar-Yossef, Ziv et al. “Approximating edit distance efficiently.” 45th Annual IEEE Symposium on Foundations of Computer Science (2004): 550-559.