Q-learning

Q-learning

Q-learning is a model-free reinforcement learning technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. One of the strengths of Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment. A recent variation called delayed Q-learning has shown substantial improvements, bringing Probably approximately correct learning (PAC) bounds to Markov decision processes.

Read more about Q-learning.