PES in reinforcement learning

bagjohn · October 14, 2017, 6:33pm

How is it possible to use learning in order to maximize or minimize a
parameter? The value of this parameter can be fed back to the learning
connection, as an error signal would be, although I don’t know if we can
call it like this, as it is best described as a reward signal.
The best I could come up with is the following:
Define a best reward value for the parameter e.g. r_best=1 which is the
best that could ever be achieved in a timestep.
Use that as target value and define the error as actual - target, where
actual is the r_current value.
That results in an error signal always negative, being more negative when
the parameter’s r_current value is negative and less negative when it is
positive (and 0 when r_current =r_best=1). But that doesn’t seem to work
(maybe because it drives errors only one-way?)

Any ideas?

Thanks

Panos