Question on Action Selection

Hi all. I have a question regarding the action selection part of SPA.

Basically I’m trying to create a model that plays tictactoe based on a set of rules for action selection via the basal ganglia and thalamus.

In the process of experimenting around, I noticed something that I cannot quite wrap my head around:

As seen in the image above, a relatively high utility value (in light blue, above) did not lead to a corresponding distinct peak in the action (second peak, below). Can someone guide me towards a better understanding into why this happens? I am under the impression that the basal ganglia-thalamus complex will always select an action of the highest utility, hence a distinct peak in the utility plot should lead to a correspondingly distinct peak in the action plot.

The action and utility are plotted in the same manner as the SPA sequence example:

Thank you!

I remember this happening when I was working on my counting model. I think it had to do with either there not being a big enough difference between the utilities or the max utility not lasting long enough. @xchoo can you confirm why this happens?