Use Reinforcement Learning with Amazon SageMaker - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Use Reinforcement Learning with Amazon SageMaker

Reinforcement learning (RL) combines fields such as computer science, neuroscience, and psychology to determine how to map situations to actions to maximize a numerical reward signal. This notion of a reward signal in RL stems from neuroscience research into how the human brain makes decisions about which actions maximize reward and minimize punishment. In most situations, humans are not given explicit instructions on which actions to take, but instead must learn both which actions yield the most immediate rewards, and how those actions influence future situations and consequences.

The problem of RL is formalized using Markov decision processes (MDPs) that originate from dynamical systems theory. MDPs aim to capture high-level details of a real problem that a learning agent encounters over some period of time in attempting to achieve some ultimate goal. The learning agent should be able to determine the current state of its environment and identify possible actions that affect the learning agent’s current state. Furthermore, the learning agent’s goals should correlate strongly to the state of the environment. A solution to a problem formulated in this way is known as a reinforcement learning method.

What are the differences between reinforcement, supervised, and unsupervised learning paradigms?

Machine learning can be divided into three distinct learning paradigms: supervised, unsupervised, and reinforcement.

In supervised learning, an external supervisor provides a training set of labeled examples. Each example contains information about a situation, belongs to a category, and has a label identifying the category to which it belongs. The goal of supervised learning is to generalize in order to predict correctly in situations that are not present in the training data.

In contrast, RL deals with interactive problems, making it infeasible to gather all possible examples of situations with correct labels that an agent might encounter. This type of learning is most promising when an agent is able to accurately learn from its own experience and adjust accordingly.

In unsupervised learning, an agent learns by uncovering structure within unlabeled data. While a RL agent might benefit from uncovering structure based on its experiences, the sole purpose of RL is to maximize a reward signal.