Gimp tile cache patches

Overview

GIMP's tile cache suffers from a few minor flaws with major practical implications:

  1. a cache flush strategy that starves read caching to nil
  2. an idle swapper (work-behind cache flusher) with performance strangled to the point of irrlevancy
  3. the idle-swapper's low performace is a blessing in disguise; it inadvertantly works during periods of heavy on-demand swapping, throwing even more writes/seeks into heavy disk activity. That is, it does not run only when GIMP is idle and tends to magnify delays while GIMP is working.

In addition to fixes offered for these implementation flaws, I've also constructed patches to add additional tile-cache profiling output and internal tile-cache consistency checking. Automated test scripts are also provided to collect profiling data.

Bug analysis

Idle swapper

The idle swapper relies on gtk idle processing to determine when to work. Unfortunately, an idle UI does not necessarily imply that GIMP is also idle. The idle processor will also wake up and work during periods of heavy computation and cache usage. When the cache is under heavy pressure (performing large numbers of flushes and swaps), idle swapper activity simply adds to the disk load as it tosses its own disk requests into the queue. It will also usually be writing or accessing non-sequential data, that is, generating more than its share of seek delays.

Perhaps for this reason, the idle swapper's performance is utterly strangled. It will attempt to flush at most only four tiles a second, that is, 64kB/s for an RGBA image. This figure is so low that any performance enahnacement provided by the idle swapper is lost in the collection noise (assuming that the idle swapper is not, in fact, causing a net performance penalty!)

Cache strategy

GIMP currently uses a tile cache flush strategy that will never flush a 'dirty' tile [that is, a tile with changes waiting to be written to disk] if the cache contains at least a single 'clean' tile. Unfortunately, the cache will only flush a dirty tile to disk or remove it from the cache in one of three cases:

  1. The idle swapper flushes the tile to disk.
  2. Maximum 'undo' depth is exceeded, resulting in old tiles being destroyed.
  3. There are no clean tiles in the cache at all.

The idle swapper cannot keep up with the rate at which dirty tiles can be generated either by interactive drawing or monolithic transforms. Maximum undo depth may be shallower than the cache depth, but does not account for temporary working space. In practice, it will not reliably provide any breathing room.

The last case means that, in practice, the tile cache becomes completely full of 'dirty' destination tiles [because they are never flushed] and a single tile cache slot, if any, is used for 'source' tiles read in fresh from disk. Because it will be the only clean tile in the cache, it will be flushed as soon as the next clean tile is read. Once the cache fills, nearly all source tiles will be read from the swap-file every time they are needed. The profiling data below supports this scenario:

This data was collected using the script 'tilecache-test1' from the test scripts below. We can see that once the tile-cache is no longer large enough to hold the entirety of the source image, the destination image and all intermediate tile state, performance decays rapidly with the bulk of tile-cache activity consisting of repeatedly reading the same tiles in from disk over and over again. For example, the profiling output from the first iteration, using an 8M tile cache:

      Peak Tile usage: 1090 Tile structs

      Total tiles swapped out to disk: 7079
      Unique tiles swapped out to disk: 6070
      Total tiles swapped in from disk: 2440622
      Unique tiles swapped in from disk: 4049
      Tiles swapped out by idle swapper: 86
      Total seeks during swapping: 1836013
      Total time spent in swap: 32.284776 seconds

      Total zorched tiles: 2447682
      Total zorched tiles swapped out: 6993
      Total zorched tiles swapped back in: 2440622
      Total zorched tiles wasted after swapping out: 2504
      Total interactive swap/cache delay: 36.798448 seconds
      

Patches/Fixes

These patches are intended to be applied in the following order.

Tile data pointer optimization

patch: 0001-Minor-change-to-TILE_DATA_POINTER-that-restricts-TIL.patch

This is a very minor 'sneak-it-in' sort of performance tweak, broken out as a seperate patch so that it's more visible. It eliminates two integer divisions (well-- it eliminates the still needlessly complex assembly resulting from optimizing out the two integer divisions) during tile data pointer lookups. It's a few-percent of 'free' performance for some transforms (such as in the new resampler that's in progress).

It has one caveat: it restricts choice of TILE_WIDTH and TILE_HEIGHT to powers of two.

Additional profiling output

patch: 0002-Add-additional-profiling-to-tile-usage-in-order-to-a.patch

This patch adds profiling collection and output to the tile cache, as well as fixing up the bitrotted tile profiling code that already existed but no longer compiles. To use it, either add '-DTILE_PROFILING' to the environment CFLAGS before running configure, or uncomment the #define at the top of app/base/tile-private.h

Output field explanation

Peak Tile usage
Displays the peak number of 'allocated' tiles, that is, tiles that actually had their tile data in core at the same time. This includes both locked tiles and tiles in the cache. This is not the total number of tiles created, and does not include tiles that have not yet been loaded/filled or tiles that have been swapped out to disk.
Total tiles swapped out to disk
Total number of swaps out to disk. The same tile being swapped multiple times is counted each time.
Unique tiles swapped out to disk
Total unique tiles swapped out to disk. The same tile being swapped to disk multiple times is only counted once.
Total tiles swapped in from disk
Total number of swaps in from disk. The same tile being swapped multiple times is counted each time.
Unique tiles swapped in from disk
Total unique tiles swapped in from disk. The same tile being read from disk multiple times is only counted once.
Tiles swapped out by idle swapper
Total tiles swapped to disk by the idle swapper. The same tile being swapped multiple times is counted each time.
Total seeks during swapping
The total number of non-sequential reads or writes, that is, reads or writes that first required a seek() operation.
Total time spent in swap
Total wall-clock time elapsed combining all seek(), read() and write() calls in the tile swapper. Counts both tile cache and idle swapper.
Total zorched tiles
Total tiles flushed (clean and dirty) from the cache to make space for a new tile. The same tile flushed multiple times is counted each time.
Total zorched tiles swapped out
Total flushed 'dirty' tiles, that is, tiles that required swapping out to write changes.
Total zorched tiles swapped back in
Total tiles flushed from the cache (both clean and dirty) that then had to be read back in from disk later.
Total zorched tiles wasted after swapping out
Total dirty tiles that were written to disk, then destroyed without ever being read back in again (wasted write operation).
Total interactive swap/cache delay
Total wall-clock time elapsed during on-demand tile flushing, that is, tile flushed triggered by a user request. All this delay is 'felt' by the user as latency before the operation completes. Includes all time spent flushing including *alloc() and free() calls, not just time spent in filesystem calls.

Implement LRU (Least Recently Used) cache flush strategy

patch: 0003-Replace-two-list-flush-clean-first-cache-strategy-wi.patch

Replaces the 'flush clean first' strategy with a simple 'flush least recently used' strategy, that is, a simple flush FIFO that always flushes from the tail, and places any newly released tile at the head. Although LRU is not a particularly clever caching strategy, it is significantly better than 'clean-first'. The measured cache performance improvemnt [when the cache is under heavy pressure] was over 7x for the interactive line-drawing test and greater than 100x for the monolithic transform test.

Profiling output from the patched tile cache [same script test as quoted above] shows that the read starvation is corrected:

      Peak Tile usage: 1089 Tile structs

      Total tiles swapped out to disk: 7626
      Unique tiles swapped out to disk: 6088
      Total tiles swapped in from disk: 10540
      Unique tiles swapped in from disk: 4353
      Tiles swapped out by idle swapper: 1452
      Total seeks during swapping: 9679
      Total time spent in swap: 0.246506 seconds

      Total zorched tiles: 17084
      Total zorched tiles swapped out: 6174
      Total zorched tiles swapped back in: 10540
      Total zorched tiles wasted after swapping out: 1505
      Total interactive swap/cache delay: 0.273391 seconds
    

Correct idle swapper startup and flush rate

0004-Correct-startup-flaw-in-idle-swapper-start-Don-t-wat.patch

This patch makes three changes to the idle swapper:

  1. Startup becomes two-step: the idle loop is used to check once-per-second if there was any cache activity or swapping in the previous second. If the cache/swap has been completely idle for at least a second, the idle swapper runs. Any non-idle-related swapping or cache activity immediately kicks the idle swapper back into the 'check for activity once a second' idle loop.
  2. The idle swapper idles/runs until there are no dirty tiles in the cache. There is no reason to only flush half of the cache as was being done before; as the idle swapper is only running during periods of complete idleness, and suspends immediately without latency upon user interaction, there is no downside to flushing completely.
  3. The idle swapper flush rate is increased to flushing ten tiles at a time every fifty milliseconds for a more respectable idle flush rate of approximately 3MB/s (for RGBA images).

Test Scripts

test scripts: gimp-tilecache-scripts.tgz

browse contents: gimp-tests/

I've written two automated test scripts to profile the tile cache changes. The first script runs three monolithic transforms in order (scale to 1500x1500, rotate, then perspective) with 8M, 16M, 32M, 64M and 128M title cache sizes. The second draws 1000 random lines with the paintbrush, ten connected lines at a time, in a growing pattern for each of the same tile cache sizes.

The scripts require the batch mode fix to work, a reasonably modern perl install, and a gimp compiled with TILE_PROFILING defined. Simply unpack the tests and run, eg:

gimp-tests/tilecache-test1 /path/to/gimp base-name-for-logfile

...and results of the test run will be put in gimp-tests/results.

Happy hacking!


Up to patch index

Monty