Next: Active Learning in Unknown Up: l9 Previous: Two Methods For Updating

Temporal Difference Learning

Instead of solving equations for all states, incrementally update visited states

TD reduces discrepancies between current and past states

If previous state has utility -100 and current state has utility +100, increase previous state utility to lessen discrepancy

Temporal difference (TD)
- When observed transition from state to state
- - $\alpha$ = learning rate
  - $\alpha(N[i]) \sim 1/N[i]$
Slower convergence