Actor-critic learning algorithm