RL Roomba

You have invested in a Roomba but you are not satisfied by just automating your vacuum cleaning. Indeed, you want to put to use your newfound reinforcement learning skills to increase the efficiency of the vacuuming process. You have access to the Roomba’s speed and sensor information, including whether the bumper sensors on the front have collided with something. Considering the following simplified state/action space. [Note that this is probably an insufficient state space to learn anything useful. It has been simplified to this in order to make it easy for you to write down the reward function within the time constraints of the lab session]: States: Roomba is `[stopped, moving slow, moving fast]`, oriented in an angle in `[0, 360)` and the bumper sensor has either `[detected bump, no bump]`. This gives us a total of 6 states, each a tuple of `(speed, sensor)`. Actions: `[maintain, accelerate forward, accelerate backward, stop]` ** 4A) ** Specify a reward function that incentivizes moving fast in any direction and disincentivizes bumping into things. Think of bumping into things as roughly twice as bad as moving slowly. ** 4B) ** In words (one or two bullet points suffice; no need to write a function mapping!), what kind of policy do you think the agent will learn? ** 4C) ** Someone tried exactly this! [Read about their experience here](https://twitter.com/smingleigh/status/1060325665671692288) and reflect. Would your reward function cause this kind of behavior, as well? Is this intended? ** 4D) ** If your reward function doesn’t cause that kind of behavior – great! Are there other “loopholes” you might find in it? (Don’t spend more than 2-3 minutes looking for loopholes.) If it does, how might you fix it? Food for thought In this case, a Roomba started moving backward which is at worst inconvenient and at best funny. This is a really common problem in reinforcement learning; see more examples [here](https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml). Think about what issues might arise if the problem were more serious or higher impact than gaming.

Discussion Guide

  1. Some guiding questions to consider here are whether accelerating forward or maintaining the same speed is a good idea when bumping into something.
  2. We’d expect the policy to make the Roomba move faster as long as it hasn’t bumped into things and back off when it has bumped into something.
  3. Note that this is linked to the fact that the accelerate backward row has a positive reward in almost every state! Backing away is generally profitable, and because there are no bump sensors on the back, it is beneficial to move backward, so that even if something is bumped, the sensors aren’t triggered.
  4. One potential loophole to check for is reward function in the (stopped, bump) state – if it is positive, the Roomba might just bump into something, stop, and then keep stopping.