Next:
Two Methods For Updating
Up:
l9
Previous:
Markov Decision Processes
Passive Learning Agent
Learning utilities U of each state
Reward is accumulated over entire sequence of states