The Basal Ganglia’s output looks much different, but the Thalamus’ output is way less accurate in my code, as it should return 1 (approximate) for the selected action… But I can’t distinguish the better action, while in the summer school example, the best action is very clear.
Even a clear winner doesn’t even close to 1 (actually ~0.31), in my code: