Next: Passive Learning Agent
Up: l9
Previous: Passive Learning in a
Assume
- finite set of states
- set of actions
- at each discrete time agent observes state
and chooses
action
- then receives immediate reward
- and state changes to
- Markov assumption: Resulting state
depending on
and
- Reward and next state depend only on current state and action
is probability of reaching state
after executing
action
in state
can be estimated from observed state transition frequencies