Can semantic pointers handle rotation of images?


It has been shown that if you have a sequence of semantic pointers, you can find the operation (or transform) that leads from one pointer to the next. You can average all these transforms together, and then apply them to the last item in the series, to see what comes next.

I was wondering if this works with rotation. When people try to match an image with a rotated image to see if it is the same, they take longer, the more the second image has been rotated. It is as if they are running a rotation simulation in their minds. Can a sequence of rotated images be predicted in Semantic pointer theory or by Spaun? In other words, if you rotate the item by 30 degrees, and then by 60 degrees, can you predict what it will look like at 90 degrees?


Short answer

Probably? No one’s tried this before.

Long answer

That feels like a somewhat non-linear transform that might be hard to learn using just PES. However, @Eric has been working with learning conv nets in a biologically plausible manner and there appears to be some prior art for doing image rotation with conv nets and natural images? Or maybe the transform is just going to be some sort of rotation matrix? It should definitely be possible, but it depends on what vector representation you’ve chosen for the image, which in turn depends on what psychological phenomena you’re trying to replicate.

For example, what features do humans usually forget/approximate when doing rotation? Do they fixate on certain landmarks? What accuracy do they usually have?


Actually, someone has tried this before. Olivia Perryman (@olivia) worked on this at the Nengo Summer School 2016. To summarize her results, it is possible to do this in Nengo. She used MNIST digits, and the rotated digits were reasonably accurate, at least to a level that they were still recognizable.

I don’t think her model was using semantic pointers, though. She had a network with a single hidden layer of neurons, and was solving for the rotation transform at the neuron level. This was leading to some problems scaling up, since the neurons were fully connected and thus the size (number of elements) of the transform matrix was the square of the number of neurons. So one extension of her work would be to use a visual model that produces semantic pointers (like the one used in Spaun) and solve for the transform in the semantic pointer space. I have no idea whether this would work or not. At the very least, you might have to make sure your network is trained on all rotations of digits.

Here’s a repository of her code:
I’m not sure what state it’s in (i.e. it might not run with current Nengo, it’s over a year old now). But looking through the code could give you an idea of places to start, if this is something you’re interested in modelling.