Levenshtein distance
Type of Edit distance, which takes into account only insert
, delete
and substitute
operations.
Weighted Levenshtein distance
Another variation is to add weight (or score) to edit operations:
- We can modify wiegth of delete and insert, but they need to be the same to preserve symmetry property
- We can modify weight of substitution, but it makes sense to choose values less than to 2 times weight of insert/delete. Otherwise algorithm can as well count it as 1 deletion, 1 insertion.
The more chances of replacing one letter with another, the less weight, for example:
- if letters are close on keyboard
- if letters sound similar, like s and z in organisation and organization.
- if letters look similar, like letter o and 0 (zero). This is usefull for OCR
See: weighted-levenshtein ↗.
Algorithms
Algorithm | Time complexity | Space complexity |
---|---|---|
Wagner-Fischer1 | ||
Space-improved version of Wagner-Fischer2 | ||
Masek-Paterson3 |
Vizualization
- Dynamic programming solution aka Wagner-Fischer:
Aproximate algorithms
Algorithm | Time complexity | Space complexity | Approximation |
---|---|---|---|
Andoni-Nosatzki: 4 | |||
Ukkonen 5 | |||
Myers 6 7 | |||
BJK 8 |
Footnotes
-
Wagner, Robert A., and Michael J. Fischer. “The string-to-string correction problem.” Journal of the ACM 21.1 (1974): 168-173 ↩
-
string2string: A Modern Python Library for String-to-String Algorithms ↗ ↩
-
Masek, William Joseph and Mike Paterson. “A Faster Algorithm Computing String Edit Distances.” ↗ J. Comput. Syst. Sci. 20 (1980): 18-31. ↩
-
Andoni, Alexandr and Negev Shekel Nosatzki. “Edit Distance in Near-Linear Time: it’s a Constant Factor.” ↗ 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS) (2020): 990-1001. ↩
-
Ukkonen, Esko. “Algorithms for Approximate String Matching.” ↗ Inf. Control. 64 (1985): 100-118. ↩
-
Eugene W. Myers. 2023. An difference algorithm and its variations ↗. Algorithmica 1, 1–4 (Nov 1986), 251–266. https://doi.org/10.1007/BF01840446 ↗ ↩
-
Landau, Gad M. et al. “Incremental String Comparison.” ↗ SIAM J. Comput. 27 (1998): 557-582. ↩
-
Bar-Yossef, Ziv et al. “Approximating edit distance efficiently.” ↗ 45th Annual IEEE Symposium on Foundations of Computer Science (2004): 550-559. ↩