# Marzal-Vidal edit distance

Marzal-Vidal measure is normalized Levenshtein distance.

In order to explain Marzal-Vidal edit distance we need to introduce notion of “edit path”. Let’s encode operations like that:

`s`

stands for`substitute`

(`c`

in original paper): the relevant letter in the source string is replaced with another letter`i`

stands for`insert`

(`v`

in original paper): a new letter is added to the destination string`d`

stands for`delete`

(`x`

in original paper): the current letter from the source string is deleted and not copied to the destination string`n`

stands for`no-change`

: the current letter is copied as is from the source string to the destination string

Let’s see how `s1`

(`acbb`

) transformed into `s2`

(`cc`

):

1 | 2 | 3 | 4 | or | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|---|---|---|---|

s1 | a | c | b | b | a | c | b | b | |

edit path | d | n | s | d | s | n | d | d | |

replace chars | c | c | |||||||

s2 | c | c | c | c |

Edit path is `dnsd`

. If we assume that cost (weight) of each edit operation is `1`

then Levenshtein distance is `1 + 0 + 1 + 1 = 3`

.

And Marzal-Vidal edit distance is:

$dis_{MV}(x,y) = \frac{dis_{Lev}(x,y)}{|path_{Lev}(x,y)|}$e.g. for given example $\frac{3}{4}$.

If all edit operations cost equals to `1`

this distance is a metric.

## Open question 🤔

I wonder what is the correct answer in this case:

$dis_{MV}(x,y) = \frac{5}{5} = 1$ (without alignment).

$dis_{MV}(x,y) = \frac{8}{9} \approx 0.88$ (with LCS alignment)

## Reading

- Fisman, Dana, Joshua Grogin, Oded Margalit and Gera Weiss. “The Normalized Edit Distance with Uniform Operation Costs is a Metric.”
*CPM*(2022) - A. Marzal and E. Vidal, “Computation of normalized edit distance and applications,”
in
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 15, no. 9, pp. 926-932, Sept. 1993, doi: 10.1109/34.232078.