Marzal-Vidal edit distance
Marzal-Vidal measure is normalized Levenshtein distance.
In order to explain Marzal-Vidal edit distance we need to introduce notion of “edit path”. Let’s encode operations like that:
s
stands forsubstitute
(c
in original paper): the relevant letter in the source string is replaced with another letteri
stands forinsert
(v
in original paper): a new letter is added to the destination stringd
stands fordelete
(x
in original paper): the current letter from the source string is deleted and not copied to the destination stringn
stands forno-change
: the current letter is copied as is from the source string to the destination string
Let’s see how s1
(acbb
) transformed into s2
(cc
):
1 | 2 | 3 | 4 | or | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|---|---|---|---|
s1 | a | c | b | b | a | c | b | b | |
edit path | d | n | s | d | s | n | d | d | |
replace chars | c | c | |||||||
s2 | c | c | c | c |
Edit path is dnsd
. If we assume that cost (weight) of each edit operation is 1
then Levenshtein distance is 1 + 0 + 1 + 1 = 3
.
And Marzal-Vidal edit distance is:
e.g. for given example .
If all edit operations cost equals to 1
this distance is a metric.
Open question 🤔
I wonder what is the correct answer in this case:
(without alignment).
(with LCS alignment)
Reading
- Fisman, Dana, Joshua Grogin, Oded Margalit and Gera Weiss. “The Normalized Edit Distance with Uniform Operation Costs is a Metric.” CPM (2022) ↗
- A. Marzal and E. Vidal, “Computation of normalized edit distance and applications,” ↗ in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 926-932, Sept. 1993, doi: 10.1109/34.232078.