Hi @trix. There’s definitely some strong similarities between PES and skipped random backpropagation (SRBP), and they might actually be closer than I originally thought.
In the case of PES, you have something like this:
where Wij is the weight from presynaptic neuron i to postsynaptic neuron j, ai is the activity of neuron i, and Ekj is the random encoder from input k to neuron j, which we use to map the error ek into the neuron space.
In the case of SRBP, you have:
which is the same as above, except we have the derivative of the postsynaptic neuron at the current activation level a’j as a third term. (Here, Bkj is the random backwards projection, which as you pointed out, functions almost identically to random encoders.)
So really the only difference is the inclusion of this derivative term in the case of SRBP. I’ve found that it is important for the stability of the algorithm to have this; with no derivative, much smaller learning rates are needed to maintain stability, which hampers learning, and even then I don’t think stability is guaranteed (though I’m not sure if it’s ever guaranteed with RBP).
The other difference is the way in which these learning rules tend to be used. I’m not sure if anyone has used PES for deep networks before, whereas that’s the whole point of SRBP. This is what makes your observation so insightful, since it draws a connection between two things that I had previously thought of as separate. PES also has strong theoretical motivations; we use the random encoders in the learning rule because they allow us to map the error from the state space of the population to the neuron space. That is, in a classification problem, we think of our postsynaptic population as “representing” the class of the stimulus, and the encoders as being the mapping from a one-hot representation of the class to the neuron activities. This is the key to the derivation of the PES rule.
When we have multiple layers, this theoretical connection begins to break down, which is why I think no one has thought to use PES in a multi-layer situation as one would SRBP. It makes sense to think of the final layer as representing the stimulus class, but could this intuition apply to earlier layers, too? I’m not sure what exactly that would entail, but it seems interesting to think about.
I hope that makes sense, and I’m interested in hearing more of your thoughts on this. I’m still trying to figure out what this possible connection means for both PES and SRBP. Thanks for pointing it out!