Reading 07: Output
Today’s hall of fame or shame candidate is the Domino’s Pizza build-your-own-pizza process. You can try it yourself by going to the Domino’s website and clicking Order to start an order (you’ll have to fill in an address to get to the part we care about, the pizza-building UI).
Some aspects to think about:
- learnability
- visibility
- efficiency
Today’s reading resumes our look into the mechanics of implementing user interfaces, by considering output in more detail.
Output Representations
One goal for these implementation classes is not to teach any one particular GUI system or toolkit, but to give a survey of the issues involved in GUI programming and the range of solutions adopted by various systems.
There are basically three ways to represent the output of a graphical user interface.
Objects are the same as the view tree we discussed previously. Parts of the display are represented by view objects arranged in a spatial hierarchy, with automatic redraw propagating down the hierarchy. There have been many names for this idea over the years; the GUI community hasn’t managed to settle on a single preferred term.
Strokes draw output by making procedure calls to high-level drawing primitives, like
drawLine
,drawRectangle
,drawArc
, anddrawText
. SVG is an example of a stroke output format.Pixels regard the screen as an array of pixels and deals with the pixels directly.
All three output representations appear in virtually every modern GUI application. The object representation always appears at the very top level, for windows, and often for graphical objects within the windows as well. At some point, we reach the leaves of the view hierarchy, and the leaf views draw themselves with stroke calls. A graphics package then converts those strokes into pixels displayed on the screen. For performance reasons, an object may short-circuit the stroke package and draw pixels on the screen directly. On Windows, for example, video players do this using the DirectX interface to have direct control over a particular screen rectangle.
reading exercises
What representation do each of the following technologies use?
pure HTML (no Javascript, CSS, or other files)
(missing explanation)
Postscript laser printer
(missing explanation)
Laser cutter
(missing explanation)
LCD screen
(missing explanation)
Since virtually every GUI uses all three representations, the design question becomes: at which points in your application do you want to step down into a lower-level kind of output? Here’s an example. Suppose you want to build a view that displays a graph of nodes and edges.
One way to do it would represent each node and each edge in the graph by an object (as in the tree on the left). Each node in turn might have two child objects, a circle and a text label. Eventually, you’ll get down to the primitive objects available in your GUI toolkit. Most GUI toolkits provide a text label; most don’t provide a primitive circle. (One notable exception is SVG, which has object equivalents for all the common drawing primitives.) This would be a pure object representation, at least from your application’s point of view - stroke output and pixel output would still happen, but inside primitive objects that you took from the library.
Alternatively, the top-level window might have no child objects. Instead, it would draw the entire graph by a sequence of stroke calls: drawCircle
for the node outlines, drawText
for the labels, drawLine
for the edges.
This would be a pure stroke.
Finally, your graph view might bypass stroke drawing and set pixels in the window directly. The text labels might be assembled by copying character images to the screen. This pure pixel representation is rarely used nowadays, because it’s the most work for the programmer, but it used to be the only way to program graphics.
Hybrid representations for the graph view are certainly possible, in which some parts of the output use one representation, and others use another. The graph view might use objects for nodes, but draw the edges itself as strokes. It might draw all the lines itself, but use label objects for the text.
As we said earlier, almost every GUI program uses all three representations. At the highest level, a typical program presents itself in a window, which is an object. At the lowest level, the window appears on the screen as a rectangle of pixels. So a series of steps has to occur that translates that window object (and all its descendents in the view tree) into pixels.
The step from objects down to strokes is usually called drawing. We’ll look at that first.
The step from strokes down to pixels is called rasterization (or scan conversion). The specific algorithms that rasterize various shapes are beyond the scope of this course (see 6.837 Computer Graphics instead). But we’ll talk about some of the effects of rasterization, and what you need to know as a UI programmer to control those effects.
reading exercises
Which of the following statements are true? (choose all good answers)
(missing explanation)
Drawing
Here’s how drawing works in the object representation. Drawing is a top-down process: starting from the root of the view tree, each object draws itself, then draws each of its children recursively. The process is optimized by passing a clipping region to each object, indicating the area of the screen that needs to be drawn. Children that do not intersect the clipping region are simply skipped, not drawn. In the example above, nodes B and C would not need to be drawn. When an object partially intersects the clipping region, it must be drawn - but any strokes or pixels it draws when the clipping region is in effect will be masked against the clip region, so that only pixels falling inside the region actually make it onto the screen.
For the root, the clipping region might be the entire screen. As drawing descends the tree, however, the clipping region is intersected with each object’s bounding box. So the clipping region for an object deep in the tree is the intersection of the bounding boxes of its ancestors.
For high performance, the clipping region is normally rectangular, using bounding boxes rather than the graphical object’s actual shape. But it doesn’t have to be that way. A clipping region can be an arbitrary shape on the screen. This can be very useful for visual effects: e.g., setting a string of text as your clipping region, and then painting an image through it like a stencil. Postscript was the first stroke abstraction to allow this kind of nonrectangular clip region. Now many graphics toolkits support nonrectangular clip regions. For example, on Microsoft Windows and X Windows, you can create nonrectangular windows, which clip their children into a nonrectangular region.
Here’s an example of the redraw algorithm running on the graph window (starting with the clipping region shown on the last slide).
- First the clip region is intersected with the whole window’s bounding box, and the window is told to draw itself within that intersection. The window draws its titlebar and its gray background. The window background effectively erases the previous contents of the window.
- The window’s clip region is now intersected with its first child’s bounding box (Node A), and Node A is told to draw itself within that. In this particular example (where nodes are represented by circle and label objects), Node A doesn’t do any of its own drawing; all the drawing will be handled by its children.
- Now Node A’s circle child is told to draw itself. In this case, the circle has the same bounding box as Node A itself, so it receives the same clip region that Node A did. It draws a white circle.
- Now Node A’s label child is told to draw itself, again using the same clip region because it has the same bounding box. It draws text on top of the circle just drawn.
- Popping back up the tree, the next child of the window, Edge A-B, is told to draw itself, using the clip region that intersects its own bounding box with the window’s clip region. Only part of the edge falls in this clip region, so the edge only draws part of itself.
- The algorithm continues through the rest of the tree, either drawing children or skipping them depending on whether they intersect the clip region. (Would Node B be drawn? Would Edge A-C be drawn? Would Node C be drawn?)
Note that the initial clip region passed to the redraw algorithm will be different every time the algorithm is invoked. Clip regions generally come from damage rectangles, which will be explained in a moment.
reading exercises
In the example above, which other objects will be redrawn? (choose all good answers)
(missing explanation)
When the bounding boxes of two objects overlap, like the circle and label objects in the previous example, the redraw algorithm induces an ordering on the objects that makes them appear layered, one on top of the other. For this reason, 2D graphical user interfaces are sometimes called 21/2D. They aren’t fully 3D, in which objects have x, y, and z coordinates; instead the z dimension is merely an ordering, called z order.
Z order is a side-effect of the order that the objects are drawn when the redraw algorithm passes over the tree. Since drawing happens top-down, parents are generally drawn underneath children (although parents get control back after their children finish drawing, so a parent can draw some more on top of all its children if it wants). Older siblings (with lower indexes in their parent’s array of children) are generally drawn underneath younger ones. Java Swing is a curious exception to this - its redraw algorithm draws the highest-index child first, so the youngest sibling ends up on the bottom of the z order.
Z order can be affected by rearranging the tree, e.g. moving children to a different index position within their parent, or promoting them up the tree if necessary. This is often important for operations like drag-and-drop, since we generally want the object being dragged to appear on top of other objects.
Some GUI toolkits allow you to change the z-order of an element without moving its position in the tree. In HTML, the CSS z-index property lets you do that. There’s a nice page that lets you explore how the z-index property works.
When a graphical object needs to change its appearance, it doesn’t repaint itself directly. It can’t, because the drawing process has to occur top-down through the view tree: the object’s ancestors and older siblings need to have a chance to paint themselves underneath it. (So, in Java, even though a graphical object can call its own paint()
method directly, you generally shouldn’t do it!)
Instead, the object asks the graphics system to repaint it at some time in the future. This request includes a damaged region, which is the part of the screen that needs to be repainted. Often, this is just the entire bounding box of the object; but complex objects might figure out which part of the screen corresponds to the part of the model that changed, so that only that part is damaged.
The repaint request is then queued for later. Multiple pending repaint requests from different objects are consolidated into a single damaged region, which is often represented just as a rectangle - the bounding box of all the damaged regions requested by individual objects. That means that undamaged screen area is being considered damaged, but there’s a tradeoff between the complexity of the damaged region representation and the cost of repainting.
Eventually - usually after the system has handled all the input events (mouse and keyboard) waiting on the queue – the repaint request is finally satisfied, by setting the clipping region to the damaged region and redrawing the view tree from the root.
There’s an unfortunate side-effect of the automatic damage/redraw algorithm. If we draw a view tree directly to the screen, then moving an object can make the screen appear to flash - objects flickering while they move, and nearby objects flickering as well.
When an object moves, it needs to be erased from its original position and drawn in its new position. The erasure is done by redrawing all the objects in the view hierarchy that intersect this damaged region; typically the drawing of the window background is what does the actual erasure. If the drawing is done directly on the screen, this means that all the objects in the damaged region temporarily disappear, before being redrawn. Depending on how screen refreshes are timed with respect to the drawing, and how long it takes to draw a complicated object or multiple layers of the hierarchy, these partial redraws may be briefly visible on the monitor, causing a perceptible flicker.
Double-buffering solves this flickering problem. An identical copy of the screen contents is kept in a memory buffer. (In practice, this may be only the part of the screen belonging to some subtree of the view hierarchy that cares about double-buffering.) This memory buffer is used as the drawing surface for the automatic damage/redraw algorithm.
After drawing is complete, the damaged region is just copied to the screen as a block of pixels. Double-buffering reduces flickering for two reasons: first, because the pixel copy is generally faster than redrawing the view hierarchy, so there’s less chance that a screen refresh will catch it half-done; and second, because unmoving objects that happen to be caught, as innocent victims, in the damaged region are never erased from the screen, only from the memory buffer.
It’s a waste for every individual view to double-buffer itself. If any of your ancestors is double-buffered, then you’ll derive the benefit of it. So double-buffering is usually applied to top-level windows.
Why is it called double-buffering? Because it used to be implemented by two interchangeable buffers in video memory. While one buffer was showing, you’d draw the next frame of animation into the other buffer. Then you’d just tell the video hardware to switch which buffer it was showing, a very fast operation that required no copying and was done during a screen’s vertical refresh interval so it produced no flicker at all.
Strokes
In our description of the redraw algorithm, we said a graphical object “draws itself,” meaning that it produces strokes to show itself on the screen. How that is actually done depends on the GUI toolkit you’re using.
In Java Swing (and many other desktop GUI toolkits, like Win32 and Cocoa), every object has a drawing method. In
Swing, this method is paint()
. The redraw algorithm operates by recursively calling paint()
down the view hierarchy.
Objects can override the paint()
method to change how they draw themselves. In fact, Swing breaks the paint()
method down into several overridable template methods, like paintComponent()
and paintChildren()
, to make it easier to affect different parts of the redraw process. More about Swing’s painting process can be found in “Painting in AWT and Swing” by Amy Fowler .
In Adobe Flex, there’s no drawing method available to override - the redraw algorithm is hidden from the programmer, much like the event loop is hidden by these toolkits. Instead, you make a sequence of stroke calls into the object, and the object records this sequence of calls. Subsequently, whenever the object needs to redraw itself, it just plays back the recorded sequence of stroke calls. This approach is sometimes called retained graphics. SVG is an example of a retained graphics mode, while Canvas uses immediate mode. Interacting with graphical objects (for example hit testing or collision detection) is often easier with retained mode graphics.
A key difference between these approaches is when stroke calls can be made. With the drawing method approach, drawing should only be done while the drawing method is active. Drawing done at a different time (like during an event handler) will not interact correctly with the redraw algorithm; it won’t respect z order, and it will be ephemeral, overwritten and destroyed the next time the redraw algorithm touches that object. With the retained graphics approach, however, the stroke calls can be recorded at any time, and the toolkit automatically handles playing them back at the right point in the redraw.
The retained graphics approach tends to be less error prone for a programmer; drawing at the wrong time is a common mistake for beginning Swing programmers.
A potential downside of the retained graphics approach is performance. The recorded strokes must be stored in memory. Although this recording is not as heavyweight as a view tree (since it doesn’t have to handle input or layout, or even necessarily be represented as objects), you probably wouldn’t want to do it with millions of stroke calls. So if you had an enormous view (like a map) being displayed inside a scrolling pane (so that only a small part of it was visible on screen), you wouldn’t want to stroke the entire map. The drawing method approach gives more control over this; since you have access to the clip region in the drawing method, you can choose not to render strokes that would be clipped. To do the equivalent thing with retained graphics would put more burden on the programmer to determine the visible rectangle and rerecord the stroke calls every time this rectangle changed.
Now let’s look at the drawing capabilities provided by the stroke representation.
Every toolkit’s stroke library has some notion of a drawing surface. The screen is only one possible place where drawing might go. Another common drawing surface is a memory buffer, which is an array of pixels just like the screen. Unlike the screen, however, a memory buffer can have arbitrary dimensions. The ability to draw to a memory buffer is essential for double-buffering. Another target is a printer driver, which forwards the drawing instructions on to a printer. Although most printers use a pixel representation internally (when the ink actually hits the paper), the driver often uses a stroke representation to communicate with the printer, for compact transmission. Postscript, for example, uses strokes.
Most stroke libraries also include some kind of a graphics context, an object that bundles up drawing parameters like color, line properties (width, end cap, join style), fill properties (pattern), and font.
The stroke library may also provide a current coordinate system, which can be translated, scaled, and rotated around the drawing surface. We’ve already discussed the clipping region, which acts like a stencil for the drawing. Finally, a stroke library must provide a set of drawing primitives, function calls that actually produce graphical output.
Many systems combine all these responsibilities into a single object. The CanvasRenderingContext2D
object used by HTML <canvas>
is a good example of this approach. In other toolkits, the drawing surface and graphics context are independent objects that are passed along with drawing calls.
When state like graphics context, coordinate system, and clipping region are embedded in the drawing surface, the surface must provide some way to save and restore the context. A key reason for this is so that parent views can pass the drawing surface down to a child’s draw method without fear that the child will change the graphics context. For HTML <canvas>
, for example, the context can be saved by the save()
method, which pushes the current drawing settings onto a stack, and popped back from the stack by restore()
.
It’s beyond the scope of this reading to talk about algorithms for converting a stroke into pixels. But you should be aware of some important techniques for making strokes look good.
One of these techniques is antialiasing, which is a way to make an edge look smoother. Instead of making a binary decision between whether to color a pixel black or white, antialiasing uses a shade of gray whose value varies depending on how much of the pixel is covered by the edge. In practice, the edge is between two arbitrary colors, not just black and white, so antialiasing chooses a point on the gradient between those two colors. The overall effect is a fuzzier but smoother edge.
Subpixel rendering takes this a step further. Every pixel on an LCD screen consists of three discrete pixels side-by-side: red, green, and blue. So we can get a horizontal resolution which is three times the nominal pixel resolution of the screen, simply by choosing the colors of the pixels along the edge so that the appropriate subpixels are light or dark. It only works on LCD screens, not CRTs, because CRT pixels are often arranged in triangles, and because CRTs are analog, so the blue in a single “pixel” usually consists of a bunch of blue phosphor dots interspersed with green and red phosphor dots. You also have to be careful to smooth out the edge to avoid color fringing effects on perfectly vertical edges. And it works best for high-contrast edges, like this edge between black and white. Subpixel rendering is ideal for text rendering, since text is usually small, high-contrast, and benefits the most from a boost in horizontal resolution. Windows XP includes ClearType, an implementation of subpixel rendering for Windows fonts. For more about subpixel rendering, see Steve Gibson, “Sub-Pixel Font Rendering Technology”.
reading exercises
Which of the following statements are true? (choose all good answers)
(missing explanation)
Pixels
Finally, let’s talk in more detail about what a pixel image looks like.
Put simply, it’s a rectangular array of pixels - but pixels themselves are not always so simple. A pixel itself has a depth, encoding its color, so the pixel representation is really three dimensional. Depth is often expressed in bits per pixel (bpp). The simplest kind of pixel representation has 1 bit per pixel; this is suitable for representing black and white images. It’s also used for bitmasks, where the single-bit pixels are interpreted as boolean values (pixel present or pixel missing). Bitmasks are useful for clipping - you can think of a bitmask as a stencil.
Another kind of pixel representation uses each pixel value as an index into a palette, which is just a list of colors. In the 4-bpp representation, for example, each of the 16 possible pixel values represents a different color. This kind of representation, often called Indexed Color, was useful when memory was scarce; you still see it in the GIF file format, but otherwise it isn’t used much today.
The most common pixel representation is often called “true color” or “direct color.”” In this representation, each pixel represents a color directly. The color value is usually split up into multiple components: red, green, and blue. (Color components are also called channels or bands; the red channel of an image, for example, is a rectangular array of the red values of its pixels.)
A pixel representation can be arranged in memory (or a file) in various ways: packed tightly together to save memory, or spread out loosely for faster access; with color components interleaved or separated; and scanned from the top (so that the top-left pixel appears first) or the bottom (the bottom-left pixel appearing first).
Many pixel representations have a fourth channel in addition to red, green, and blue: the pixel’s alpha value, which represents its degree of transparency.
The primary operation on the pixel representation is copying a block of pixels from one place to another - often called bitblt (pronounced “bit blit”). This is used for drawing pictures and icons on the screen, for example. It’s also used for double-buffering - after the offscreen buffer is updated, its contents are transferred to the screen by a bitblt.
Bitblt is also used for screen-to-screen transfers. To do fast scrolling, for example, you can bitblt the part of the window that doesn’t change upwards or downwards, to save the cost of redrawing it. (For example, look at Swing’s JViewport.BLIT_SCROLL_MODE
.)
It’s also used for sophisticated drawing effects. You can use bitblt to combine two images together, or to combine an image with a mask, in order to clip it or composite them together.
Bitblt isn’t always just a simple array copy operation that replaces destination pixels with source pixels. There are various different rules for combining the destination pixels with the source pixels. These rules are called alpha compositing, where the alpha channel of an object is used to paint it on top of the background. We may talk about them in a later reading.
Here are a few common image file formats. It’s important to understand when to use each format. For user interface graphics, like icons, JPG generally should not be used, because it’s lossy compression - it doesn’t reproduce the original image exactly. When every pixel matters, as it does in an icon, you don’t want lossy compression. JPG also can’t represent transparent pixels, so a JPG image always appears rectangular in your interface.
For different reasons, GIF is increasingly unsuitable for interface graphics. Provided you start with an image which is already reduced to 8-bit color (256 colors or less) and only binary transparency (on/off), GIF isn’t lossy. However, its color space is very limited. GIF images use 8-bit color, which means that there can be at most 256 different colors in the image. That’s fine for some low-color icons, but not for graphics with gradients or blurs. GIF has limited support for transparency - pixels can either be opaque (alpha 1) or transparent (alpha 0), but not translucent (alpha between 0 and 1). So you can’t have fuzzy edges in a GIF file, that blend smoothly into the background. GIF files can also represent simple animations.
PNG is the best current format for interface graphics. It supports a variety of color depths, and can have a full alpha channel for transparency and translucency.
If you want to take a screenshot, PNG is the best format to store it.
Animation Principles
Let’s turn to another kind of output: animation, or time-changing output.
Some might say, based on bad experiences with the Web, that animation has no place in a usable interface. Indeed, the <blink>
tag originally introduced by the Netscape browser was an abomination. And many advertisements on the Web use animation to grab your attention, which distracts you from what you’re really trying to do and makes you annoyed. So animation has gotten a bad rap in UI.
Others complain that animation is just eye candy – it makes interfaces prettier, but that’s all.
But neither of those viewpoints is completely fair. Used judiciously, animation can make an important contribution to the usability of an interface.
One important use of animation is to enhance feedback, by drawing attention to and explaining changes in the display that would otherwise be hard to follow and understand. An example is a change that was not made by the user – due to actions by other users in a multi-user application, or actions by the software itself. A move made on a checkerboard by another player is an example of a change that might be animated.
Another change is a major transition in the viewpoint of the display, because a sudden drastic shift can disorient the user. What happened? Which direction did I go? How would I get back to where I was before? Scrolling around a large space, or zooming in or out, are examples of these transitions. Problems tend to occur not when the transition is under direct manipulation control (e.g. dragging a scrollbar thumb), but rather when it happens abruptly due to a button press (Page Up) or other action (Zoom pulldown menu, hyperlink press, another user or computer action, etc.) Animating the transition as if the user had done it with fast direct manipulation helps the user stay oriented.
Animation can also be used for progress feedback, to show that something is still happening. Here, the animation basically reassures the user that the program hasn’t crashed. The loading indicator on a web browser is an example of this kind of animation; it’s probably the most trivial kind to do.
Fortunately, it turns out that in many cases, you don’t need to do anything special to obtain the benefits of animation.
Many event-driven parts of a GUI are already incremental. If the user is dragging the scrollbar thumb, then they’re basically animating the interface themselves – you don’t need to do anything special. (Although we’ll discuss something called motion blur in a moment that may be worth adding to enhance the visual effect.) Similarly, backend events (like bytes read from a file or files copied) may be able to make progress feedback appear animated, simply by coming often enough so that the progress bar fills in smoothly. But if the backend events are bursty or have long gaps between them, you may need to supplement with animation. That’s why web browsers have an animated throbber – because sometimes network connections just sit there generating no backend events to drive the progress.
If feedback is very brief, or a transition very short in distance, then animation is likewise unnecessary. If an object moves on the screen by a distance less than its width, then it’s probably not worth animating that transition. Animations shorter than 100 msec will probably be too fast to be noticed.
Let’s talk about a few good design rules. First, the frame rate should be at least 20 frames per second – i.e., the animation should make an incremental change at least 20 times per second. If animation is the main purpose of the program, then it should be willing to throw CPU cycles into even higher frame rates to improve smoothness and realism. As a guideline, film and TV signals are typically at least 24-30 fps. Video games typically expect a minimum of 60fps for smooth animation.
But for feedback, 20 fps is plenty. Feedback animation should be secondary to the real work of the program, and shouldn’t dominate CPU time.
Keep in mind that there’s a difference between the frame rate of an animation and the refresh rate of the whole display system. The refresh rate of a display system is the rate at which the screen is reilluminated. For CRTs, this was the rate that the electron guns swept across the entire screen illuminating phosphors. Since the phosphors fade if not repeatedly hit by the electron gun, a slow refresh rate will cause the screen to darken between sweeps – producing a noticeable flicker that the eye can perceive. The shutter on a film projector has a similar problem, since it makes the movie screen flash between bright and dark. So it actually opens and closes faster than 24 times a second – typically 48 Hz or 72 Hz – in order to make the flicker less perceptible. As a result, each frame of a film is illuminated 2 or 3 times before the projector stutters ahead to the next frame.
So a 20fps animation may be convincing, but if it’s displayed on a CRT using a 20Hz refresh rate, it will have a very noticeable flicker.
Big jumps are disruptive, so pay attention when you’re using low frame rates with high object speeds. In the real world, an object doesn’t disappear from one place and reappear in another – it passes through the intervening points, and even if it moves too fast for us to focus on it, it leaves an impression of its path in our visual system – a smear. That smear is called motion blur. If an object is moving so fast that it moves more than its own width between frames, leaving a visible gap between subsequent images of the object, then you should consider filling in the gap with simulated motion blur. Two common ways to do it: (1) a smear of the object’s color, and (2) simply drawing multiple overlapping images of the object. Another solution is to crank up the frame rate, if that’s possible.
There are several useful ways to use animation to enhance the illusion of direct manipulation (Chang & Ungar, “Animation: From Cartoons to the User Interface”, UIST ’93), which were originally drawn from the experience of Disney cartoonists (J. Lasseter, “Principles of Traditional Animation applied to 3D Computer Animation”, SIGGRAPH ‘87).
The principle of solidity says that the animated behavior of an object should give clues about its squishiness – so a ball, when it strikes the ground, should flatten out into an ellipse before rebounding (example). Most objects in GUIs are rigid, so the solidity principle is mostly about preventing high-speed GUI objects from appearing to teleport across the screen (using motion blur), and having them fade into and out of view rather than appearing and disappearing abruptly.
Anticipation means that an object winds up a bit (moving backwards to get more leverage) before starting a motion. The wind-up draws the user’s attention, and resembles what animate creatures in the real world do when they move.
Slow-in, slow-out describes how a realistic animation should be paced – rather than keeping a constant speed throughout, the object should accelerate up to a cruising speed, and then decelerate to a stop.
Finally, follow-through says that objects should wiggle back and forth a bit when they finish a motion, to expend the remaining kinetic energy of the motion
For examples of all these effects, see Daniel Bodinof, “Principles of Animation“.
In general, feedback animations should be kept short. Many users will wait for it to stop before continuing, so there’s a tradeoff between the duration of the animation and the efficiency of the interface. Don’t block user control – allow the user to start on their action while the animation is still going on.
Finally, use animation judiciously. As we know from ads on web sites, constant motion is distracting and agitating.
Animation Implementation
Now let’s focus on how to implement animation, starting with animations in the pixel representation.
Frame animation is probably the easiest kind of animation. It consists of a sequence of images, each a little different from the previous, that are displayed at regular intervals. Most of the moving pictures you know – movies, television, digital video – work this way. Animated GIFs are an easy and widely supported way to put frame animation into a GUI. GIFs have only 8-bit color resolution, unfortunately; if you need richer color, look at APNG (Animated PNG), which is supported by most browsers. An alternative format, MNG, was technically more capable but also more complex, and lacks support. HTML and SVG can also be animated, with CSS Animations. Still another choice is to do it yourself, by displaying a sequence of PNGs.
Suppose you can’t use animated GIFs. How do you animate changes to a pixel or stroke representation?
When animation is merely used as a feedback or help effect in an application that otherwise isn’t concerned with animation, the best solution is the timer and redraw approach. The basic idea is to use a timer (such as setTimeout
or setInterval
in Javascript), which delivers periodic events to the GUI event queue. Set the timer interval to the desired frame rate (e.g. 50 msec for 20 fps). Every time the timer fires, use the current clock time to determine where to redraw the objects. Stop the timer when the animation is done.
This technique integrates very well with an existing GUI application, because automatic-redraw and input-handling support of the GUI toolkit continue to be supported even while the animation runs.
The object approach to output enables a more modular approach to animation. Rather than putting animation code into the redraw, we can simply update the changing properties of the component (position, size, etc.) on every timer tick. This technique is called property animation.
CSS animations help you animate between different CSS values. You can set up an animation using a @keyframes
rule,
which takes a set of keyframes and their target values.
For example, if you specify “opacity: 0.0”, then the animation will change the opacity property steadily from its initial value (which might be 1.0 if the object is currently opaque) until it reaches the target value.
To apply that animation to an element, you use the animation
property, with a duration (in seconds or milliseconds) and an animation name.
There are many more optional parameters that this property accepts, which you can look up in the documentation.
Easing and path are relevant to animating both stroke and object approaches.
Easing concerns how the animation evolves over time. It’s a function that maps clock time to an abstract parameter, usually in the range [0,1], representing the completeness of the animation (0% to 100%). Easing is how you can implement the slow-in/slow-out cartooning principle. Slow-in/slow-out is done with a sigmoid (S-shaped) function. The arctangent is a good, easy sigmoid; so is 1/(1+e^-x). Note that you have to tweak the domain and range of these functions so that the desired time domain (0 to the duration of the animation) maps to the desired s-parameter range (typically 0..1). Normally arctangent maps the domain [-inf, +inf] to the range [–PI/2, PI/2].
Path describes how the animated property moves through its value space. For position, this is easy – it’s the curve of points that the position traces out. If you want to add some visual appeal to a moving object, make it move through an arc. A quadratic Bezier curve has this effect and is trivial to implement – you just need a control point between the start point and end point. The control point pulls the curve away from the straight line between the endpoints – to be precise, the curve is tangent at each endpoint to the line that connects the endpoint with the control point.
For multidimensional properties like color, you’ll want to think about what color space the path should go through, and whether you want the path to be simply a line in that color space or something more complicated. We’ll talk more about different color spaces in a future reading.
By default, CSS uses a slow-out easing, but you can switch to linear easing by using the linear
keyword or provide your own custom cubic-bezier()
function.
You can use this app to generate cubic-bezier()
values.
See a great demo of various easing functions that graphs their actual behavior.
Unfortunately, easing functions with more than 2 points cannot be represented by the CSS cubic-bezier()
function. You can either emulate them by using multiple keyframes, or you can use JavaScript. A popular animation library for more complex effects than what CSS animations can provide is Greensock.
Final Words
A final word about debugging the output of a graphical user interface, which can sometimes be tricky. A common problem is that you try to draw something, but it never appears on the screen. Here are some possible reasons why.
Wrong place: what’s the origin of the coordinate system? What’s the scale? Where is the object located in its parent?
Wrong size: if an object has zero width and zero height, it will be completely invisible no matter what it tries to draw- everything will be clipped. Zero width and zero height tend to be the defaults for primitive objects! If you make a div or a span with nothing in it, it’ll be zero width and height. You have to give it content, or manually set its size, to make it more reasonable size. Check whether the object (and its ancestors) have nonzero sizes.
Wrong color: is the drawing using the same color as the background? Is it using 100% alpha, so that it’s completely transparent?
Wrong z-order: is something else drawing on top?
Layout: Objects remember where they were put, and draw themselves there. They also support automatic layout. With strokes or pixels, you have to figure out (at drawing time) where each piece goes, and put it there.
Input: Objects participate in event dispatch and propagation, and the system automatically does hit-testing (determining whether the mouse is over the object when an event occurs) for objects, but not for strokes. If a graph node is an object, then it can receive its own click and drag events. If you stroked the node instead, then you have to write code to determine which node was clicked or dragged.
Redraw: An automatic redraw algorithm means that objects redraw themselves automatically when they have to. Furthermore, the redraw algorithm is efficient: it only redraws objects whose extents intersect the damaged region. The stroke or pixel representations would have to do this test by hand. In practice, most stroked objects don’t bother, simply redrawing everything whenever some part of the view needs to be redrawn.
Drawing order: It’s easy for a parent to draw before (underneath) or after (on top of) all of its children. But it’s not easy to interleave parent drawing with child drawing. So if you’re using a hybrid representation, with some parts of your view represented as objects and others as strokes, then the objects and strokes generally fall in two separate layers, and you can’t have any complicated layering relationships between strokes and objects.
Heavyweight objects: Objects may be big—even an object with no fields costs about 20 bytes in Java. As we’ve seen, the view tree is overloaded not just with drawing functions but also with event dispatch, automatic redraw, and automatic layout, so the properties and state used by those processes further bulks up the class.
Views derived from large amounts of data - say, a 100,000-node graph - generally can’t use an object for every individual data item. The “flyweight” pattern can help, by storing redundant information in the object’s context (i.e., its parent) rather than in each object, but few toolkits support flyweight objects. (See “Glyphs: Flyweight Objects for User Interfaces” by Paul R. Calder and Mark A. Linton. UIST ‘90.)
Device dependence: The stroke representation is largely device independent. In fact, it’s useful not just for displaying to screens, but also to printers, which have dramatically different resolution. The pixel representation, on the other hand, is extremely device dependent. A directly-mapped pixel image won’t look the same on a screen with a different resolution.
reading exercises
Which of the following statements are true? (choose all good answers)
(missing explanation)
(missing explanation)
(missing explanation)
(missing explanation)