Number of nearly orthogonal vectors in high-dimensional spaces

fmirus · April 6, 2018, 4:07pm

In the book “how to build a brain”, there is this figure about availability of vectors in (high-dimensional) conceptual spaces

However, I could not find an explicit formula or proof for the number of nearly orthogonal vectors available in high-dimensional spaces in the book. I assume, this formula should depend on the dimension $D$ of the vector space and the allowed angle $\epsilon$ between vectors. Furthermore, I assume that it is somehow related to the fact, that the dot-product of two random unit vectors is $\beta$ distributed with zero mean and variance $\sigma = \frac{1}{\sqrt{D}}$. Could you please point me to a paper/reference with an explicit formula (and proof)?

jgosmann · April 6, 2018, 9:31pm

http://ieeexplore.ieee.org/document/6771362/?arnumber=6771362&tag=1

arvoelke · April 7, 2018, 2:38am

There’s also nengo.dists.CosineSimilarity whose analytical CDF gives the exact probability of two vectors being at least some angle apart. The general problem of knowing the optimal number of such vectors is an unsolved problem related to sphere packing, and so equations such as the one above only provide a rough bound.

jgosmann · April 7, 2018, 3:11am

As @arvoelke stated, this is given by nengo.dists.CosineSimilarity which is closely related to the Beta distribution (but not exactly the same). [This paper] gives the the formula for the PDF of the distribution of the angle between random unit vectors. By performing a change in variables, one can obtain the PDF for the dot product (i.e. nengo.dists.CosineSimilarity). This paper and this paper (see supplementary material S1 and S2) provide generalized PDF from which the same distribution for the dot product can be derived.

I think it can be shown that the CosineSimilarity distribution approaches a normal distribution with the mean and variance you stated, if d \rightarrow \infty. But I don’t have any derivation of that fact handy. However, I have some notes showing that the expected value of the absolute value of the dot product is approximately \frac{1}{\sqrt{\pi (d - 1) / 2}} which might relate to the scaling of the variance with \frac{1}{\sqrt{d}}, but doesn’t give you the exact scaling.