Skip to content

Metric measure

The fact that a measure is a metric allows optimizations in many applications. For example, many efficient algorithms for searching shortest paths in graphs, such as Dijkstra’s algorithm, make use of the fact that the underlying measure is a metric.

d:U×UR+{0}d: U \times U \rightarrow R^+ \cup \{0\} is called metric if the following holds:

  • d(x,y)0d(x,y) \geq 0 non-negativity
  • d(x,y)=d(y,x)d(x,y) = d(y,x) symmetry
  • x=yd(x,y)=0x = y \Leftrightarrow d(x,y) = 0 identity
  • d(x,z)d(x,y)+d(y,z)d(x,z) \leq d(x,y) + d(y,z) triangle inequality

Metric measure sometimes called (mathematical) distance. But not every string measure which is historically called “distance” is a metric, so it’s better to use word metric to avoid confusion.

Minkowski distance

Minkowski distance is an example of metric measure (not typically used as string measure though, given here as an example)

D(X,Y)=(i=1nxiyip)1pD\left(X,Y\right) = \left(\sum_{i=1}^n |x_i-y_i|^p\right)^{\frac{1}{p}}

where X=(x1,x2,,xn) and Y=(y1,y2,,yn)RnX = (x_1,x_2,\ldots,x_n) \text{ and } Y = (y_1,y_2,\ldots,y_n) \in R^n. For p1p \geq 1 , the Minkowski distance is a metric.

  • p=1p = 1 Manhattan distance
  • p=2p = 2 Euclidean distance
  • pp \rightarrow \infty Chebyshev distance

Indexing metric spaces

VP-treeMVP-treeBall partitioningSpace-partitioningOnion-treeMM-treeαIM-treeEGNATHyper-plane partitioningGH-treeGNATMetric-space Indexing TreeSlim-treeM-treeNon-Space partitioning

Use two fingers to pan and zoom

from Exploring intersection trees for indexing metric spaces

Data StructureSpace ComplexityConstruction ComplexityClaimed Query ComplexityExtra CPU Query Time
BKTnn ptrsnlognn \log nnαn^α
FQTnnlognn \ldots n \log n ptrsnlognn \log nnαn^α
FHQTnnhn \ldots nh ptrsnhnhlognb\log n^bnαn^α
FQAnhbnhb bitsnhnhlognblog n^bnαlognn^α \log n
VPTnn ptrsnlognn \log nlognc\log n^c
MVPTnn ptrsnlognn \log nlognc\log n^c
VPFnn ptrsn2αn^{2−α}n1αlogncn^{1−α} \log n^c
BSTnn ptrsnlognn\log nnot analyzed
GHTnn ptrsnlognn\log nnot analyzed
GNATnm2nm^2 dstsnmlogmnnm \log_m nnot analyzed
VTnn ptrsnlognn\log nnot analyzed
MTnn ptrsn(mm2)logmnn(m \ldots m^2) \log_m nnot analyzed
SATnn ptrsnlogn/loglognn \log n / \log \log nn1Θ(1/loglogn)n^{1− \Theta(1 / \log \log n)}
AESAn2n^2 dstsn2n^2O(1)dO(1)^dnn2n\ldots n^2
LAESAknkn dstsknknk+O(1)dk + O(1)^dlognkn\log n \ldots kn
LCnn ptrsn2/mn^2/mnot analyzedn/mn/m

from Searching in Metric Spaces

  • BKT - Burkhard–Keller Tree
  • FQT - A further development over BKTs is the fixed queries tree
  • FHQT - fixedheight FQT
  • FQA - fixed queries array
  • VPT - vantage-point trees
  • MVPT - multi-vantage-point tree
  • VPF - excluded middle vantage point forest
  • BST - bisector trees
  • GHT - generalized-hyperplane tree
  • GNAT - The GHT is extended to an m-ary tree. Geometric near-neighbor access tree
  • VT - Voronoi tree
  • MT - M-tree
  • SAT - spatial approximation tree
  • AESA - approximating eliminating search algorithm
  • LAESA - linear AESA
  • LC - list of clusters
CategoryIndexStorageDistance Domain
Pivot-based tablesAESA, LAESA, EPT, CPTMain-memoryContinuous
Pivot-based treesBKT, FQT, FQAMain-memoryDiscrete
VPT, MVPTMain-memoryContinuous
Pivot-based external indexesPM-tree, Omni-family, M-index, SPB-treeDiskContinuous

from Pivot-based Metric Indexing

Other