If I understand the code (that you linked) correctly, q_node
is not computing the next state’s Q value. It’s calculating the Q value of the previous state and previous action (i.e., Q(s’, a’)). There is even a comment about it in the code:
nengo.Connection(q_node,learning_conn.learning_rule,transform =-1,synapse=fast_tau) ##0.9*Q(s',a')+r
@drasmuss (Daniel Rasmussen) spearheaded the development of reinforcement learning implementations in Nengo while he was pursuing his PhD. I’d recommend starting out with his work. His PhD thesis is available here, and he has some code in it, but, it’s programmed in an older version of Nengo and will need to be updated. He also has a more easily digestible paper here that summarizes some of the work he did for his PhD.
There’s also a discussion here about building custom environments for Q learning agents, if you are interested in that.