Next: Example
Up: l9
Previous: Active Learning in Unknown
- Assigns value to action-state pairs, not just states
- These values are called Q-values
= value of doing action
in state
- Do not need transition model
- Learned directly from explicit reward feedback
- Provide condition action rules
- Relation between utility values and Q values
-
- TD-based Q learning
- When transition from state
to state
-