Next: Examples
Up: l9
Previous: Example
- Actions have two purposes
- Gaining rewards
- Gaining information leading (possibly) to better rewards
- If always use best approximate Q value, may miss better parts of space
- Space may be dynamic
- Solution: Select actions probabilistically to balance
exploration and exploitation
- Select action
that maximizes
[Russell and Norvig]
-
[Mitchell]