Reading 12: Input
Our Hall of Fame or Shame candidate for today is the command ribbon, which was introduced in Microsoft Office 2007. The ribbon was a radically different user interface for Office, merging the menubar and toolbars together into a single common widget. Clicking on one of the tabs (“Home”, “Insert”, “Page Layout”, etc) switches to a different ribbon of widgets underneath.
Let’s talk about:
- external consistency
- what steps did the Office 2007 designers take to preserve some consistency with previous versions of Office?
- what pre-existing UI widgets does the ribbon resemble, metaphorically?
- how did the Office 2007 designers decide which commands to put on each tab of the ribbon?
- how does this design improve feedback?
Today’s reading finishes our look into the mechanics of implementing user interfaces, by examining input in more detail. We’ll look mainly at keyboard and mouse input, but also multitouch interfaces like those on modern smartphones and tablets. This reading has two key ideas for thinking about input. First, that state machines are a great way to think about and implement tricky input handling (like direct manipulation operations). Second, that events propagate through the view tree, and by understanding this process, you can make good design choices about where to attach the listeners that handle them.
Input Events
There are two major categories of input events: raw and translated. A raw event comes from a state transition in the input hardware. Mouse movements, mouse button down and up, and keyboard key down and up are the raw events seen in almost every capable GUI system. A toolkit that does not provide separate events for down and up is poorly designed, and makes it difficult or impossible to implement input effects like drag-and-drop or game controls. And yet some toolkits like that did exist at one time, particularly in the bad old days of handheld and mobile phone programming.
For many GUI components, the raw events are too low-level, and must be translated into higher-level events. For example, a mouse button press and release is translated into a mouse click event – assuming the mouse didn’t move much between press and release - if it did, these events would be interpreted as a drag rather than a click, so a click event isn’t produced.
Key down and up events are translated into character-typed events, which take modifiers (Shift/Ctrl/Alt) and input methods (e.g. entering for Chinese characters on a standard keyboard) into account to produce a Unicode character rather than a physical keyboard key. In addition, if you hold a key down, multiple character-typed events may be generated by an autorepeat mechanism (usually built into the operating system or GUI toolkit). When a mouse movement causes the mouse to enter or leave a component’s bounding box, entry and exit events are generated, so that the component can give feedback - e.g., visually highlighting a button, or changing the mouse cursor to a text I-bar or a pointing finger.
Here’s our first example of using state machines for input handling. Inside the GUI toolkit, a state machine is handling the translation of raw events into higher-level events. Here’s how the click event is generated - after a mousedown and mouseup, as long as the mouse hasn’t moved (much) between those two events. Question for you: what is the threshold on your favorite GUI toolkit? If it’s measured in pixels, how large is it? Does the mouse exiting the bounding box of the graphical object trigger the threshold regardless of pixel distance? In this case, the raw events (down, up, move) are still delivered to your application, along with the translated event (click). This means that if your application is handling both the raw events and the translated events, it has to be prepared to expect this. This often comes up with double-click, for example: your application will see two click events before it sees the double-click event. As a result, you can’t make click do something incompatible with double-click. But occasionally, low-level events are consumed in the process of translating them to higher-level events. It’s a difference you have to pay attention to in your particular toolkit.
The keyboard focus is also part of the state of the input system, but it isn’t in the input hardware - instead, the keyboard focus is a particular object in the view tree that currently receives keyboard events. On some X Windows window managers, you can configure the keyboard focus to follow the mouse pointer - whatever view object contains the mouse pointer has the keyboard focus as well. On most windowing systems (like Windows and Mac), however, a mouse down is the more common way to change the focus.
Input events carry with them some or all of these properties, which represent the state of the input hardware immediately after the event occurred. On most systems, all events include the modifier key state, since some mouse gestures are modified by Shift, Control, and Alt. Some systems include the mouse position and button state on all events; some put it only on mouse-related events. The timestamp indicates when the input was received, so that the system can time features like autorepeat and double-clicking. It is essential that the timestamp be a property of the event, rather than just read from the clock when the event is handled. Events are stored in a queue, and an event may languish in the queue for an uncertain interval until the application actually handles it, so it’s necessary for the time of the event to be captured as close to the event’s actual occurrence (the press or release in the event object itself). Keyboard events can be trickier to handle than mouse events because identifying the key involved in the event is not always easy. Particularly for cross-platform toolkits (HTML, Flash, Java), there may be a variety of different keyboard hardware with different sets of keys, and in HTML/Javascript, different browsers may work differently. There is the further complication that translated key events (the “character typed” event) do not represent a keystroke (like Shift or PgUp or the A key), but rather a character (like “a” or “A” or “%”). Keystrokes are identified by physical keys on the keyboard; characters are identified by values in a character set (like Unicode or ASCII). In jQuery, do not treat keydown/keyup and keypress as interchangeable; their names may be similar, but the parameters of the events are different.
User input tends to be bursty - many seconds may go by while the user is thinking, followed by a flurry of events. The event queue provides a buffer between the user and the application, so that the application doesn’t have to keep up with each event in a burst. Recall that perceptual fusion means that the system has 100 milliseconds in which to respond. Edge events (button down and up events) are all kept in the queue unchanged. But multiple events that describe a continuing state - in particular, mouse movements - may be coalesced into a single event with the latest known state. Most of the time, this is the right thing to do. For example, if you’re dragging a big object across the screen, and the application can’t repaint the object fast enough to keep up with your mouse, you don’t want the mouse movements to accumulate in the queue, because then the object will lag behind the mouse pointer, diligently (and foolishly) following the same path your mouse did. Sometimes, however, coalescing hurts. If you’re sketching a freehand stroke with the mouse, and some of the mouse movements are coalesced, then the stroke may have straight segments at places where there should be a smooth curve. If something running in the background causes occasional long delays, then coalescing may hurt even if your application can usually keep up with the mouse.
reading exercises
Which of the following user interface techniques rely on translated events? (choose all good answers)
Event Dispatch and Propagation
The event loop reads events from the queue and dispatches them to the appropriate components in the view tree. On some systems (notably Microsoft Windows), the event loop also includes a call to a function that translates raw events into higher-level ones. On most systems, however, translation happens when the raw event is added to the queue, not when it is removed. Every GUI program has an event loop in it somewhere. Some toolkits require the application programmer to write this loop (e.g., Win32); other toolkits have it built-in (e.g., Java Swing).
Here are some examples of how mouse events are dispatched and propagated. The window shown here has the view tree shown below it, in which each graph node is represented by a Node component with two children, a Circle (displaying a filled white circle with a black outline) and a text Label (displaying a text string, such as “A” or “B”).
First consider the green mouse cursor; suppose it just arrived at this point. Then a mouse-move event is created and dispatched to the topmost component whose bounding box contains that point, which is Label A. If Label A doesn’t handle the mouse-move event, then the event is propagated up to Node A; if that doesn’t handle the event either, it’s propagated to Window, and then discarded. Notice that Circle A never sees the event, because event propagation goes up the tree, not down through z-order layers.
Now consider the blue mouse cursor. What component will be the initial target for a mouse-move event for this point? The answer depends on how hit-testing is done by the toolkit. Some toolkits support only rectangular bounding-box hit testing, in which case Edge A-C (whose bounding box contains the mouse point) will be the event target. Other toolkits allow hit testing to be overridden and controlled by components themselves, so that Edge A-C could test whether the point actually falls on (or within some small threshold of) the actual line it draws. Java Swing supports this by overriding Component.contains()
. If Edge A-C rejects the point, then the next component in z-order whose bounding box contains the mouse position is the window itself, so the event would be dispatched directly to the window.
The previous slides describe how virtually all desktop toolkits do event dispatch and propagation. Alas, the Web is not so simple. Early versions of Netscape propagated events down the view tree, not up. On the Web, the view tree is a tree of HTML elements. Netscape would first determine the target of the event, using mouse position or keyboard focus, as we explained earlier. But instead of sending the event directly to the target, it would first try sending it to the root of the tree, and so forth down the ancestor chain until it reached the target. Only if none of its ancestors wanted the event would the target actually receive it. Alas, Internet Explorer’s model was exactly the opposite - like the conventional desktop event propagation, IE propagated events upwards. If the target had no registered handler for the event (and no default behavior either, like a button or hyperlink has for click events), then the event would propagate upwards through the tree. The W3C consortium, in its effort to standardize the Web, combined the two models, so that events first propagate downwards to the target (a phase called “event capturing”, not to be confused with mouse capture), and then back upwards again (“event bubbling”). You can register event handlers for either or both phases if you want. Modern standards-compliant browsers, like Firefox and Opera, support this model; so does Adobe Flex. One advantage of this two-phase event propagation model is that it gives you a lot more flexibility as a programmer to override the behavior of other components. By attaching a capturing listener high up in the component hierarchy, you can handle the events yourself and prevent other components from even seeing them. For example, if you want to implement an “edit mode” for your UI, in which the user can click and drag around standard widgets like buttons and textboxes, you can do that easily with a single capturing listener attached to the top of your UI tree. In the traditional desktop event propagation model, it would be harder to prevent the buttons and textboxes from trying to interpret the click and drag events themselves, and you would have to add listeners to every single widget.
Multitouch interfaces like the Apple iPhone introduce a few wrinkles into the event dispatch story. Instead of having a single mouse position where the event occurs, a multitouch interface may have multiple points (fingers) touching the screen at once. Which of these points is used to decide which component gets the event? Here’s how the iPhone does it. Each time a finger touches down on the screen, the location of the new touch-down is used to dispatch the touch-down event. All events carry along information about all the fingers that are currently touching the screen, so that the component can recognize multitouch gestures like pinching fingers together or rotating the fingers. (This is a straightforward extension of keyboard and mouse events, in fact - most input events carry along information about what keyboard modifiers are currently being held down, and often the current mouse position and mouse button state as well.) Two kinds of event capture are used in the iPhone. First, after a touch-down event is dispatched to the component that it touched first, that component automatically captures the events about all future moves of that finger, even if it strays outside the bounds of the component, until the finger finally leaves the screen (touch-up). This is similar to the automatic mouse capture used by Java Swing when the mouse is dragged. Second, a component can also turn on its “exclusive touch” property, which means that if the first touch on the screen (after a period of no fingers touching) is dispatched to that component, then all future touch events are captured by that component, until all fingers are released again. (Apple, Event Handling, iPhone Application Programming Guide, 2007).
reading exercises
Suppose you want to block all mouse input to an interface. Which of the techniques below could help you do that, assuming your UI toolkit supports them? (choose all good answers)
(missing explanation)
State Machines
Now let’s look at how components that handle input are typically structured. A controller in a direct manipulation interface is a state machine. Here’s an example of the state machine for a push button’s controller. Idle is the normal state of the button when the user isn’t directing any input at it. The button enters the Hover state when the mouse enters it. It might display some feedback to reinforce that it affords clickability. If the mouse button is then pressed, the button enters the Armed state, to indicate that it’s being pushed down. The user can cancel the button press by moving the mouse away from it, which goes into the Disarmed state. Or the user can release the mouse button while still inside the component, which invokes the button’s action and returns to the Hover state. Transitions between states occur when a certain input event arrives, or sometimes when a timer times out. Each state may need different feedback displayed by the view. Changes to the model or the view occur on transitions, not states: e.g., a push button is actually invoked by the release of the mouse button.
Here’s a state machine suitable for drag & drop. Notice how each state of the machine produces different visual feedback, in this case the shape of the cursor. The push button on the last page had the same property. This is a common case in input implementation, since different states of an input controller often represent different modes from the user’s point of view, and distinguishing those modes with visual feedback helps reduce mode errors. Visual feedback can also happen on the transitions, but it may have to be animated to be effective, because the transitions, like pressing or releasing a button, are very brief.
State machines are also useful for modeling and tracking low-level interaction with the pointing device itself - the mouse or touchscreen. The top state machine in this slide shows the states of a mouse or touchpad. Lifting the mouse off the table, or lifting your finger off a touchpad, is called clutching. Why do you need to clutch with a mouse or touchpad? The bottom state machine shows a touchscreen, which has only two states. What kinds of affordances are harder to provide on a touchscreen, because it lacks the tracking state?
reading exercises
Which of the following are true of the states of an input-processing state machine? (choose all good answers)
(missing explanation)