Next: Passive Learning in a
Up: l9
Previous: Reinforcement Learning
- Learning an optimal strategy for maximizing future reward
- Agent has little prior knowledge and no immediate feedback
- Action credit assignment difficult when only future reward
- Two basic agent designs
- Agent learns utility function
on states
- Agent learns action-value function
- Can handle deterministic or probabilistic state transitions