Skip to content
Fuzzy text search
Search
Ctrl
K
Cancel
Twitter
GitHub
Select theme
Dark
Light
Auto
Intro
Playground
Measures
Bag distance
Cosine similarity
Damerau-Levenshtein distance
Dice coefficient
Hamming distance
Harmonic edit distance
Jaccard index
Jaccard index, generalized
Jaro similarity
Jaro-Winkler similarity
LCS distance
Levenshtein distance
Marzal-Vidal edit distance
Monge-Elkan similarity
Monge-Elkan similarity, generalized
Otsuka-Ochiai coefficient
Overlap coefficient
Tversky index
Normalisations
Higuera, Mico (normalisation)
Levy et al. (normalisation)
Li, Bo (normalisation)
Notes
Algorithms
B-Tree
Confusion matrix
Edit distance
Ngrams, Qgrams, skip-grams, etc.
Inverted index
Metric measure
Normalised measure
Pairwise alignment
How can we represent a string?
String similarity measure
Term Frequency — Inverse Document Frequency
Weighted Levenstein
Twitter
GitHub
Select theme
Dark
Light
Auto
Levy et al. (normalisation)
d
i
s
n
o
r
m
(
x
,
y
)
=
d
i
s
(
x
,
y
)
max
(
∣
x
∣
,
∣
y
∣
)
dis_{norm}(x, y) = \frac{dis(x,y)}{\max(|x|,|y|)}
d
i
s
n
or
m
(
x
,
y
)
=
max
(
∣
x
∣
,
∣
y
∣
)
d
i
s
(
x
,
y
)
Warning
This normalization can’t be used for
LCS distance
d
i
s
n
o
r
m
(
a
,
b
)
=
d
i
s
L
C
S
(
a
,
b
)
max
(
∣
a
∣
,
∣
b
∣
)
=
2
dis_{norm}(a, b) = \frac{dis_{LCS}(a, b)}{\max(|a|,|b|)} = 2
d
i
s
n
or
m
(
a
,
b
)
=
max
(
∣
a
∣
,
∣
b
∣
)
d
i
s
L
CS
(
a
,
b
)
=
2
Use
Higuera, Mico (normalisation)
instead.
Reading
Levy et al. (2006)