Reinforcement Learning

Exploration

The reinforcement learning problem as described requires clever exploration mechanisms. Randomly selecting actions is known to give rise to very poor performance. The case of (small) finite MDPs is relatively well understood by now. However, due to the lack of algorithms that would provably scale well with the number of states (or scale to problems with infinite state spaces), in practice people resort to simple exploration methods. One such method is -greedy, when the agent chooses the action that it believes has the best long-term effect with probability, and it chooses an action uniformly at random, otherwise. Here, is a tuning parameter, which is sometimes changed, either according to a fixed schedule (making the agent explore less as time goes by), or adaptively based on some heuristics (Tokic & Palm, 2011).

Read more about this topic: Reinforcement Learning

Famous quotes containing the word exploration:

“For women who do not love us, as for the “disappeared”, knowing that we no longer have any hope does not prevent us form continuing to wait. We live on our guard, on watch; women whose son has gone asea on a dangerous exploration imagine at any minute, although it has long been certain that he has perished, that he will enter, miraculously saved, and healthy.”
—Marcel Proust (1871–1922)

“The future author is one who discovers that language, the exploration and manipulation of the resources of language, will serve him in winning through to his way.”
—Thornton Wilder (1897–1975)

“I call her old. She has one family
Whose claim is good to being settled here
Before the era of colonization,
And before that of exploration even.
John Smith remarked them as he coasted by....”
—Robert Frost (1874–1963)

Related Phrases

Called Optimal

Current Research

Dynamic Programming

Function Approximation

Related Words