Skip to content
Fuzzy text search
Search
Ctrl
K
Cancel
Twitter
GitHub
Select theme
Dark
Light
Auto
Intro
Playground
Measures
Bag distance
Cosine similarity
Damerau-Levenshtein distance
Dice coefficient
Hamming distance
Harmonic edit distance
Jaccard index
Jaccard index, generalized
Jaro similarity
Jaro-Winkler similarity
LCS distance
Levenshtein distance
Marzal-Vidal edit distance
Monge-Elkan similarity
Monge-Elkan similarity, generalized
Otsuka-Ochiai coefficient
Overlap coefficient
Term Frequency — Inverse Document Frequency
Tversky index
Normalisations
Higuera, Mico (normalisation)
Levy et al. (normalisation)
Li, Bo (normalisation)
Notes
Algorithms
Confusion matrix
Edit distance
Exact String Matching
Ngrams, Qgrams, skip-grams, etc.
Metric measure
Normalised measure
Pairwise alignment
How can we represent a string?
String similarity measure
Twitter
GitHub
Select theme
Dark
Light
Auto
Otsuka-Ochiai coefficient
s
i
m
O
O
(
x
,
y
)
=
∣
x
∩
y
∣
∣
x
∣
⋅
∣
y
∣
sim_{OO}(x,y) = \frac{|x \cap y|}{\sqrt{|x|\cdot|y|}}
s
i
m
OO
(
x
,
y
)
=
∣
x
∣
⋅
∣
y
∣
∣
x
∩
y
∣
It’s like
Cosine similarity
but sets-based.
Reading
Jiang, Yu, Guoliang Li, Jianhua Feng and Wen-Syan Li. “String Similarity Joins: An Experimental Evaluation.”
Proc. VLDB Endow.
7 (2014): 625-636.
↗