Next: Learning an Action-Value Function
Up: l9
Previous: Temporal Difference Learning
- Consider actions, their outcomes, and possible reward
- Select action that maximizes expected reward
- From utility theory, the expected utility of an action given evidence
can be calculated as
-