Skip to content

Marzal-Vidal edit distance

Marzal-Vidal measure is normalized Levenshtein distance.

In order to explain Marzal-Vidal edit distance we need to introduce notion of “edit path”. Let’s encode operations like that:

  • s stands for substitute (c in original paper): the relevant letter in the source string is replaced with another letter
  • i stands for insert (v in original paper): a new letter is added to the destination string
  • d stands for delete (x in original paper): the current letter from the source string is deleted and not copied to the destination string
  • n stands for no-change: the current letter is copied as is from the source string to the destination string

Let’s see how s1 (acbb) transformed into s2 (cc):

1234or1234
s1acbbacbb
edit pathdnsdsndd
replace charscc
s2cccc

Edit path is dnsd. If we assume that cost (weight) of each edit operation is 1 then Levenshtein distance is 1 + 0 + 1 + 1 = 3.

And Marzal-Vidal edit distance is:

disMV(x,y)=disLev(x,y)pathLev(x,y)dis_{MV}(x,y) = \frac{dis_{Lev}(x,y)}{|path_{Lev}(x,y)|}

e.g. for given example 34\frac{3}{4}.

If all edit operations cost equals to 1 this distance is a metric.

Open question 🤔

I wonder what is the correct answer in this case:

abcde
sssss
efghi

disMV(x,y)=55=1dis_{MV}(x,y) = \frac{5}{5} = 1 (without alignment).

abcde
ddddniiii
efghi

disMV(x,y)=890.88dis_{MV}(x,y) = \frac{8}{9} \approx 0.88 (with LCS alignment)

Reading