Compressed Virtual Memory
for a Handheld Computer

Kenneth Lu

6.033 Design Paper 1
March 16, 2000
Saltzer, TR2

Abstract

        Handheld computers are becoming comparable to desktop computers in performance and feature set, but battery life is still a major issue. Since disk accesses due to virtual memory page faults can be a major source of energy drain, this paper describes a scheme of compressing memory to avoid accessing the disk. The system is relatively simple and efficient, but it is hampered by its reliance on low page fault rates. As long as miss rates are kept low, however, this compressed memory system can improve energy efficiency by an order of magnitude.

1 Introduction

        Handheld computers today are nearly comparable in performance and feature set to desktop computers. In fact, they are powerful enough that users are now able to run memory-intensive multimedia applications on them. Unfortunately, battery life is an area in which handheld computers are seriously restricted. Disk accesses, in particular, use a great deal of power. When the user is running many memory-intensive programs, the computer may begin to encounter frequent page faults and waste a great deal of energy accessing the disk.

        The compressed virtual memory technique outlined below is intended to alleviate some of the burden of page faults to disk by compressing part of main memory. This increases the amount of virtual memory which is stored in RAM, thus reducing the number of page faults to disk and prolonging battery life.

2 System Specifications

        The virtual memory system suggested in this paper is designed to support a handheld computer with the following hardware specifications: The processor is rated for a maximum clock rate of 200 MHz and can be reduced to 100 MHz or 33 MHz. The CPU runs at 0.4 W at peak processing, and power consumption scales down linearly with decreasing clock speed. The processor executes one instruction every clock cycle and is well-pipelined. The computer has 32 MB of actual RAM, and a 3.2 GB hard drive. The hard drive consumes 4.7 W for 20 ms when powered up from sleep, 2.3 W for 10 ms to perform a typical read or write, 0.95 W while powered up and idle, and 0.1 W when sleeping.

        We will first consider a system where 16 MB of the actual RAM is used as uncompressed memory, and the other 16 MB is used as compressed memory. The hard drive is used for the remainder of the address space. Virtual memory pages are 8 KB in size. The page compression algorithm is able to compress pages with a 4:1 ratio on average. Note that this makes the 16 MB of compressed memory appear to be, on average, 64 MB, giving us a total of 16 + 64 = 80 MB of non-disk memory. Compression and decompression of data occurs at an average of 4 instructions per byte, thus needing 4 · 8192 = 32768 instructions to compress or decompress a page.

        The virtual memory system uses the standard Least Recently Used (LRU) page replacement algorithm both for storing uncompressed memory pages in compressed memory and for storing pages onto disk. The system's miss rates per memory access for particular amounts of available memory are provided in Appendix A. There is reason to believe, however, that this table is not entirely accurate. This issue will be discussed in detail in section 5.

[Figure 1]

3 Paging Structure

3.1 Virtual Memory High-Level Structure and Multi-level Paging

        Since we have less than 4 GB of total storage space, a 32-bit address will be sufficient for our purposes. In order to address every byte in each of the 8 KB pages, the virtual memory physical offset will need to be 13 bits in size. (8192 = 213) That leaves 19 bits for the virtual page number. 219 is a very large number of Page Table Entries (PTEs), however, and we probably do not want them to permanently take up a predetermined block of memory. Thus, we will want to implement a 2-level virtual memory system. We will use a typical 4-byte (32-bit) PTE, so 213-2 = 211 PTEs will fit within each of 28 pages. There will thus be a Level One page table with 28 entries, each of which points to a page full of Level Two PTEs. The 19 virtual page number bits of the virtual address will be divided into an 8-bit Level One offset and an 11-bit Level Two offset. (See Figure 1.)

        The advantages of using a 2-level page table are two-fold: Firstly, 219 4 byte page table entries would take up 221 bytes, or 2 MB of precious uncompressed memory. With a 2-level page table, only a single kilobyte is permanently taken up by the Level One page table. The other tables can be paged away to disk (on which 2 MB is relatively insignificant) if they're not being used.

[Figure 2]

3.2 Compressed Page PTEs

        Compressed pages take up 2 KB on average, but can vary in size. We will assume that the compression algorithm is smart enough to just save a file uncompressed if compression actually makes it larger than 8 KB, and that the decompression algorithm will be smart enough to recognize such uncompressed pages. Thus, a compressed page will never take up more than 8 KB. All compressed pages will be aligned on 16-byte increments. That is, the first byte of a compressed page will always have an address which is a multiple of 16 bytes. The reason for this will become clear shortly.

        All page table entries begin with a disk bit (which states whether the page is actually in memory or whether it's on disk), a dirty bit used by the page fault handler to note whether a page needs to be saved to disk [1], and a free bit for the page fault handler or the virtual memory manager to use, or for future use. For compressed page PTEs, the disk bit is off.

        The next 9 bits store the size of the compressed page. Since a compressed page can be of variable length, the virtual memory manager needs to know this size. 9 bits is enough to address any 16-byte block within 8 KB. (213-4 = 29) The final 20 bits store the location of the compressed page within the actual 16 MB. 20 bits is enough to address any 16-byte block within 16 MB. (224-4 = 220) (See Figure 2.) The reason for using 16-byte blocks should now be clear. The numbers conveniently allow a common block size for addressing page locations and addressing page sizes. Common alignment allows the memory manager to ensure that if only part of a 16-byte block is used, that entire block can be fetched, because new pages will never start half-way into a block.

        The obvious alternative to this 16-byte alignment is to use 24 bits for location and 13 bits for size. This would total 37 bits, and would require a page table entry larger than 32 bits. Because typical compressed pages are on the order of 2 KB, wasting an average of 8 bytes (half of 16) per page is quite insignificant. On average, there will be 224-11 = 213 = 8192 compressed pages stored in memory. If 8 bytes of each page are wasted, we are wasting a total of 64 KB of space. That is an entirely reasonable compromise.

[Figure 3]

3.3 Uncompressed Page PTEs

        The disk bit for uncompressed page PTEs is also off. Again, the next two bits are a dirty bit and a reserved bit. In order to indicate that a PTE physical address refers to a physical page number in uncompressed memory, no additional bits are necessary; we can simply overload the 9-bit compressed page size field and specify that if this field is zero, the PTE is for uncompressed memory. This works because a compressed page can never take up zero space, so that value is not needed for compressed pages. Since the uncompressed entry only needs to refer to an 8 KB physical page in 16 MB of uncompressed memory, it only needs 11 bits for its physical page number. (224-13 = 211) We thus leave the next 9 bits blank, and use only the final 11 bits. (See Figure 3.)

[Figure 4]

3.4 On-Disk Page PTEs

        The disk bit for on-disk page PTEs is on. The next two bits are a dirty bit and a reserved bit. The implementation of the physical page number for on-disk page PTEs is flexible, but one simple implementation would be to skip the next 10 bits and use the final 19 bits to store physical page numbers aligned on 8 KB pages. This would allow the addressing of 4 GB of drive space, which is more than enough, given that the virtual address space is only 4 GB and the hard drive is only 3.2 GB. (See Figure 4.)

3.5 Fragmentation

        The uncompressed memory and on-disk memory will never get seriously fragmented, since they are always aligned on 8 KB pages, so when a page is paged out, another page can always be easily paged back in. The same is not true, however, for the compressed memory space, in which pages are variable in size. If a particularly small page were swapped out, this would not guarantee the availability of a space large enough for a page which does not compress as well.

        Luckily, since we are using a virtual address space, the virtual memory manager can safely defragment the compressed memory space by moving blocks of memory around and updating the associate page tables appropriately. This method does have some drawbacks. Namely, it takes extra work to implement, and may slow the system down if too much housekeeping is required. (The details of implementation would require further research and are beyond the scope of this paper.)

        The alternative is to invent an addressing system which allows pages to be segmented when being stored in compressed memory space. This latter method, however, would not only introduce additional data structures (which would not fit nicely into our 32-bit PTEs) but would also introduce an additional level of complexity. Furthermore, the operational overhead incurred in keeping track of the extra segmentation information may nullify the performance benefit of not defragmenting. Incurring a performance penalty to defragment compressed memory when needed seems to be the cleaner solution. Nonetheless, since the precise extent of the performance penalty is currently unknown, an alternative may need to be sought if the penalty turns out to be too great.

4 Potential Optimizations

4.1 Processor Speed Reduction

        Reducing the processor speed to save energy will likely not be very useful during normal operation. When the CPU is being used, the linear power-speed relation means that the same amount of energy must be used to perform a task regardless of the clock speed. For instance, a page will always take the same amount of energy to compress or decompress regardless of clock speed. Speed reduction would thus only be useful when the processor is not being run at full capacity. One might imagine that the processor would be idle while waiting for disk accesses, but if we assume we have asynchronous disk reads and a sufficiently pipelined processor, the processor would remain in use when a disk is being read or powered up.

        The only times when the processor may be idle are when it is waiting for input from the user or from a connection with no other pending tasks that it can perform. In these situations, it would likely help to power down the processor to 33 MHz in order to spend only 1/6 of the 0.4 W that the processor normally uses.

4.2 Hard Drive Sleep

        Some page faults will inevitably lead to disk accesses, and we should thus try to optimize the amount of energy spent by disk accesses. There is an inherent tradeoff between wasting power to leave a disk drive spinning and using power to repeatedly power up the drive after powering it down when it's not in use. When there are very few disk accesses, it would make more sense to power the drive on only when it's needed. However, when there are enough disk accesses, it becomes more efficient to just leave the drive spinning to avoid the warm-up cost. If we are given that the drive is read from or written to n times per second, we arrive at the following calculations:

Energy per read or write = 2.3 W · 0.01 sec = 0.023 J

Energy per startup then read or write = 4.7 W · 0.02 sec + 2.3 W · 0.01 sec = 0.117 J

        Thus, to discover the n at which we should begin to just leave the drive spinning, we use the following equation, where we add energy used during access time to energy used during any remaining idle time for both situations:

0.117n + 0.1(1-0.03n) > 0.023n + 0.95(1-0.01n)

0.1005n > 0.85

n > 8.46

        In other words, on average, if there are 9 or more disk accesses per second, it is more efficient to simply leave the drive running at all times. On the other hand, if there are 8 or fewer disk accesses per second, it is more efficient to power up the hard drive only when necessary.

5 System Evaluation

5.1 Per Access Energy Efficiency Comparison

        A compressed virtual memory system would only be useful if it actually were successful in being more energy-efficient than purely disk-based virtual memory, of course. First, we compare the amount of energy consumed to support a single page read after encountering a page fault in uncompressed main memory for each situation.

        As mentioned in section 4.2, a disk read would take either 23 mJ or 117 mJ depending upon whether the hard drive needs to be powered up. As mentioned in section 2, 4 · 8192 = 32768 instructions are required to decompress a page. At the full clock speed of 200 MHz, 1.6384x10-4 sec are needed to decompress a page. The energy consumed by decompressing a page can be considered as the energy used during the time of the calculation, which would include energy consumed by both the CPU and the idle hard drive: 1.6384x10-4(0.4 + 0.1) = 8.192x10-5 J, or 0.08192 mJ if the hard drive is sleeping, and 1.6384x10-4(0.4 + 0.95) = 2.212x10-4 J, or 0.2212 mJ if the hard drive is spinning. Thus, reading a page from disk takes at least 100 times more energy than reading a page from compressed memory.

        Compressed memory access in our system is two orders of magnitude more energy-efficient than disk access when treated on equal footing, looking at one specific typical page read. Of course, we must use both systems together and look at overall energy usage to get a true indication of whether the system is actually better.

5.2 Parameter Specification

        Before analyzing efficiency, there are two key parameters that need to be determined: miss rate and instructions per memory access. A general miss rate table is given (see Appendix A), but miss rates are very application-specific, and the effects of various miss rates must therefore be investigated. The percentage of instructions which access memory is also somewhat application dependent, but it generally falls between 25% and 40% for modern RISC-based processors [2].

        Using figures of 25% memory accesses per instruction and 4% miss rate on 16 MB, we can determine the frequency of page faults to compressed storage at 200 MHz:

        0.25 · 200x106 · 0.04 = 2x106 faults per second. Dividing that into the number of instructions per second gives us a compressed page access every 100 instructions. Given that compressing or decompressing a page alone takes 32768 instructions, this rate of faults would mean that the CPU would spend nearly all of its time (about 99.97%) compressing and decompressing pages! The miss rate must be reduced by at least 3 orders of magnitude before the miss rate even begins to be reasonable. At that point, about a third of all instructions would be for compression and decompression. This is still quite a performance hit, but it may be worth the battery life savings.

        If we use the figures of 25% memory accesses per instruction and 0.0156% miss rate on the 80 MB of non-disk memory, we can determine the frequency of page faults to disk at 200 MHz:

        0.25 · 200x106 · 0.0000156 = 7800 faults per second. Unfortunately, even at a peak rate of continuous 10 ms read/writes, only 100 page read/writes can per performed each second. That means that not only will the disk be continuously reading (and thus not saving any energy), it would likely forever have many read/write requests in queue as well. Again, the miss rates must be reduced by at least 3 orders of magnitude (to about 8 read/writes per second) before it is reasonable.

5.3 Miss Rates

        Are such low miss rates plausible? They could be if the user primarily uses a single large application at a time. After all, 16 MB is usually enough working space for the system and a single application, especially in a low screen resolution environment like a handheld computer. Similarly, 80 MB today is generally enough to hold a few commonly used applications. Prabhas Chongstitvatana lists typical page fault rates as being from 0.00001% to 0.001%, which also fits this model [3].

        Thus, the system should be quite usable in the real world with low miss rates, but it would not be usable with the miss rates in the provided table.

5.4 Overall Energy Efficiency Comparison

        This comparison is very difficult because it is incredibly dependent upon the miss rate. If we simply reduce the given table's values by 4 orders of magnitude and assume a 0.00000156% miss rate on 80 MB, a 0.0001% miss rate on 32 MB, and a 0.0004% miss rate on 16 MB, we get the following results:

        For a traditional, compressionless 32 MB system, the number of page faults to disk would be 0.25 · 200x10-6 = 50 faults per second. That's greater than 8, so the disk will optimally just be left spinning all the time, leading to a power consumption of 0.023 · 50 + 0.95(1 - 0.01 · 50) = 1.625 W.

        Now, for a 16 MB uncompressed/16 MB compressed system, the number of page faults to compressed memory would be 0.25 · 200x106 · 4x10-6 = 200 faults per second. Using the previous value for time to compress/decompress a page, we get 200 · 1.6384x10-4(0.4 + 0.1) = 0.016384 W, assuming the drive is sleeping the whole time. In addition, the number of page faults to disk would be 0.25 · 200x106 · 1.56x10-8 = 0.78 faults per second. That's less than 8, so the disk will only spin up when necessary (thus validating our assumption of the drive sleeping most of the time), the power consumption by the disk will be 0.117 · 0.78 + 0.1(1 - 0.03 · 0.78) = 0.18892 W, for a total of 0.205 W.

        Thus, under these parameters, the compressed virtual memory system uses about 1/8 the power of the traditional system, thus justifying its development.

5.5 RAM Partitioning

        This paper has always assumed a RAM partitioning of 16 MB uncompressed to 16 MB compressed. In order to determine the optimal partitioning, further research could be performed by repeating the calculations in this section with the new partition sizes. In addition, adjustments will need to be made to the partition table data structures to accommodate the new sizes. Most aspects of the action calculations will remain the same.

6 The Future

        In the near future, technology is likely to continue progressing very quickly. As memory becomes cheaper, the handheld computers this system is designed for will likely have 80 MB or more of physical RAM. However, user demand for memory will likely increase as well, and a compressed memory system (with updated data structures) will still be useful. With more memory, the number of page faults to compressed memory could decrease due to better availability of uncompressed memory. The number of page faults to disk could decrease as well. When processor speeds increase, the most immediate benefit would be to the compression/decompression routines, followed perhaps by the defragmentation routines. Finally, flash memory is also progressing very quickly, and if large capacity, low power storage becomes publicly available, the system would no longer be necessary at all.

7 Conclusion

        Handheld computers are becoming ever more popular, and battery power remains a major issue. This compressed virtual memory system could help to alleviate energy concerns for many users. The system has the advantage of being relatively easy to implement. It exhibits a structure very similar to traditional multi-level virtual memory systems. On the downside, the system is highly dependent on low page fault rates. As long as software for the platform is written to minimize miss rates, however, the compressed memory system can greatly reduce energy consumption by an order of magnitude or more, and it can thus greatly lengthen the computer's battery life.

Acknowledgments

        Bill Thies brought the compressed memory fragmentation issue to my attention.

References

[1] Halstead, Robert H., Jr., and Wards, Stephen A., Computational Structures, MIT Press, Cambridge, MA, 1990, page 491

[2] Sun Microsystems, "The UltraSPARC Architecture", The UltraSPARC Processor -- Technology White Paper

[3] Chongstitvatana, Prabhas, "Virtual Memory", Computer Architecture (Lecture Notes), Chulalongkorn University, Bangkok, Thailand

Appendix A

        This is the given RAM used/miss rate table.

RAM used (MB)

Miss rate
16 4%

24

2%

32

1%

40

0.5%

48

0.25%

56

0.125%

64

0.0625%

72

0.0313%

80

0.0156%

88

0.0078%

96

0.0039%

104

0.0019%

112

0.0009%

120

0.0004%