The Jaccard index can be generalized to multisets or bags, which are basically sets in which repeated elements are allowed.
The multisets x and y sharing the same elements (support) can be simply represented as respective vectors x=[x1,x2,…,xN],y=[y1,y2,…,yN], where N is the total number of possible distinct elements in the universe defined by the union of the two multiset elements, and xi corresponds to the multiplicity of element i in the multiset x. The Jaccard index for multisets then becomes:
As an example, let’s consider x={{a,a,a,b,b}} and y={{a,a,b,c,c,d}}. If we have the set of possible elements organized into the indexing vector p=[a,b,c,d], we will obtain x=[3,2,0,0] and y=[2,1,2,1]. Observe that the order of elements in p is immaterial to our analysis.