There are three potential bottlenecks in the system: the disk (too much paging activity to/from disk), the CVM system (a CPU bottleneck), and application processing (a CPU bottleneck). Let's look at each in turn:
Let fd be the fraction of dirty pages. The amount of time spent idle by the CPU, per memory reference, is given by 0.01*10*(1+fd)ms and the number of memory references per second cannot exceed 100/ {(0.01) * (1 + fd) = 10000/(1 + fd).
Suppose we have M MBytes of RAM (M=32 MBytes for us) and that X MBytes are uncompressed and (M - X ) are compressed. Then, the miss rate to compressed memory = m(X), where m() is the miss-rate function given by the table in the specification. The maximum number of memory references per second, r, that can be handled can then be found using the equation:
r * m(X) = 3051 ==> r = 3051 / m(X)
When X = 16 MBytes, m(X) = 0.04 and r = 76,275 memory references per second.
When X = 16 MBytes, the total amount of compressed + uncompressed memory = 16 + 4*16 = 80 MBytes, and m(80) = 0.0156%. This means that the maximum number of disk operations per second required = 76275 * 0.000156 = 12 when no pages are dirty, and 12*2 = 24 if all pages are dirty. In terms of the fraction of pages that are dirty, the number of disk operations cannot exceed 12(1+fd).
III. Application processing bottleneck
Suppose that the bottleneck in the system is the application's CPU processing. When there's no compression and we use 32 MBytes of RAM, the energy consumed by a single disk operation is 2.3W * 10ms = 23 mJ, which implies that writing a dirty page and reading in the required one consumes 46 mJ. This happens for 1% of all RAM references. So, the average energy per RAM reference is 0.46 mJ for dirty pages. In terms of the fraction of pages that are dirty, the average energy per RAM reference is 0.23(2-fd). Call this Eram_nocomp.
The energy required to compress and decompress an 8-KByte page at 200 MHz = 2*32768/(200*106) * 400 * 10-3 = 0.13 mJ. For the same (as in the uncompressed case) RAM reference pattern and processor caches, the amount of energy consumed by compression and decompression and disk access per RAM reference can be calculated as: {m(16) - m(80)} * 0.13 + m(80)* 23 * (1+fd) = (0.009+0.003fd) mJ. Call this Eram_comp.
Consider any VM system for this computer and let d be the average time (in seconds) between two successive accesses to disk. When idle, the disk consumes 0.1W of power, but requires 4.7W * 20ms = 94mJ to spin up. On the other hand, if d is small, then it may be worth keeping the disk spinning, but this takes up 0.95W of power. Thus, it is worth spinning the disk down provided 0.1d + 94 < 0.95d, or d > 110.5 ms.
In 110.5ms, the processor running at 200
MHz can execute about 22 million instructions, and the effective RAM miss
rate for CVM must be as small as 4.5*10-8, which is orders of
magnitude smaller than m(80), the miss best rate for CVM.
Thus, we arrive at the conclusion that spinning down the disk is a losing
proposition.
This depends on the total time of execution as well as on R. If R is very small, then both compression isn't a big win because there are hardly any RAM references and therefore hardly any accesses to disk. If R is large, then compression could be beneficial because it reduces the number of accesses required to disk. We perform the analysis assuming that there is exactly one user program being run on the system.
Regardless of whether compression is used or not, the disk continues to spin, so the energy it consumes is 0.95T, where T is the total execution time of the program. The processor takes up 400 mW at 200 MHz, and if we clock it down to 33 MHz while waiting for disk (this would be a good design decision), it dissipates proportionally less power.
For both the original VM and CVM, the energy consumption is given by:
E =
Pspin*T
+ Pcpu_busy * Tcpu_busy + Pcpu_idle
*
Tcpu_idle
+ R * Eram
(3)
where T = time to execute instructions
+ time to wait for paging to occur
= Tcpu_busy +
Tcpu_idle,
and
Eram =
average energy consumed per memory reference (Section III, equations 1
and 2)
We know all the power numbers: Pspin = 0.95W, Pcpu_busy = 0.4W, and Pcpu_idle = 0.4*33/200W = 0.066W.
Case 1: No compression
With no compression, the Tcpu_busy = 5ms (1 million instructions at 200 MHz). The CPU idle time is obtained from Section I as 0.1(1+fd)*R ms. Therefore,
Using Equation (1) and Equation (3), we find that:
Case 2: CVM
For CVM case, calculating the execution time Tcomp is a little more complicated since it depends on the likelihood that a page not in uncompressed RAM is in compressed RAM and on the likelihood that it is on disk. This likelihood is given by the miss rate m(): the probability that a page is in compressed RAM = m(16) - m(80) and the probability that it's on disk = m(80). Thus,
Tcomp = {(1*106)/ (200 * 106) + (time to compress & decompress)*[m(16) - m(80)]*R + 0.01(1+fd)*m(80) * R} seconds.
Section II tells us the time to compress & decompress a page, it is 1/3051 ms = 0.327ms. During compression/decompression, the CPU runs at 200 MHz, m(16) = 0.04, and m(80) = 0.000156, so
Tcomp = (5 + 0.013R + 0.00156R + 0.00156fdR) ms = (5 + 0.015R + 0.00156fdR) ms (7)
Using Equations (2), (3), and (6) allows us to express Ecomp after some algebra:
Ecomp = (6.75+ 0.019R + 0.005fdR) mJ (8)
If R is small (but not tiny enough to spin the disk down), then there is no significant benefit to compression, but this cannot usually happen in reality. For a typical value of R, suppose that the program makes 1.25 memory accesses per instruction (one to fetch and every fourth is a load/store). Suppose that one in ten (10%) references make it to RAM past the cache. Then R = 1.25 * 106 * 0.1 = 1.25 * 105
(As an aside, Patterson and Hennessy, in their Computer Architecture book claim that the Spec92 suite sees a cache miss rate of about 1% for a 128-KByte processor cache. In this case, R = 1.25 * 106 * 0.006 = 7,500. The corresponding energy benefit of compression would be about 28.8X in this case too. But this value of L1 miss-rate doesn't really fit in with the numbers mentioned in the project specification.)
Also note that this analysis is for a single program running on the computer. The case when the computer runs many concurrent programs is analogous, but the results are somewhat different under the assumption that the CPU is never idle (since some other program can execute when one is blocked). The way to do this analysis is to start with a total amount of energy E and calculate the total system lifetime until this energy is exhausted for this workload. This may be an instructive exercise for the reader to do.
In conclusion, we find that the energy benefits of CVM are between 17.5X and 27.7X depending on the fraction of dirty pages, for a single program running on our handheld computer.
Last update: Fri Mar 24 19:54:48 EST
2000 (hari)