next up previous
Next: Learning an Action-Value Function Up: l9 Previous: Temporal Difference Learning

Active Learning in Unknown Environment