Robert Morris 6.828 2009 Lecture 8: Performance Performance: why am I lecturing about it? often has huge effect on design often forces major re-design as loads grow faster CPUs haven't "solved" performance problems have grown too, e.g. Internet-scale disk and net latency hasn't improved Performance Introduction your system is too slow [diagram: clients, net, server CPU, server disk] perhaps too many users, and they are complaining what can you do? 1. measure to find bottleneck could be any of the components incl client you hope to find a dominating bottleneck! 2. relax bottleneck increase efficiency or add hardware Decide what you mean by "performance" throughput: requests/second (for many users) latency: time for a single request sometimes inverses: if it takes 0.1 second of CPU, and one CPU, then throughput = 10/sec often not inverses w/ 2 CPUs, latency still 0.1, but throughput 20/sec queuing and pipelining I will focus on throughput, appropriate for heavily loaded systems most systems gets slow as # of users goes up at first, each new user uses some otherwise idle resources then they start to queue [graph: # users, reply/sec, linear up, hits a plateau] [graph: # users, delay, stays at zero then linear up (queuing delay)] How to find bottleneck? 1. measure, perhaps find that e.g. cpu is 100% used but maybe not, e.g. cpu 50% use and disk 50% used, just at different times 2. model net should take 10ms, 50ms CPU, 10ms disk 3. profile tells you where CPU time goes 4. guess you will probably have to do this anyway test hypothesis by fixing slowest part of system can be difficult: if disk busy all the time, should you buy a faster disk / two disks? or more RAM? You may need to make application logic more efficient fewer features, better algorithms I cannot help you with that -- application-specific but there are general-purpose techniques The main performance techniques 1. caching 2. I/O concurrency 3. scheduling 4. parallel hardware (two disks, two CPUs, &c) these are most useful when many waiting requests but that will often be the case if your server is heavily loaded I'm going to focus on disk bottlenecks Every year it gets harder to be CPU-bound what tricks for good disk performance? hitachi 7K400: 400 GB. 7200. 8.5ms r seek, 9.2ms w. 567 to 1170 s/t. 10 heads. abt 87000 cylinders. primer on disks [disk slide] physical arrangement rotating platter: 7,200 RPM, 120/sec, 8.3 ms / rotation continuous rotation circular tracks, 512-byte sectors, about 1000 sectors/track perhaps 100,000 tracks, concentric rings multiple platters, 5 for 7K400, so 10 surfaces cylinder: set of vertically aligned tracks one disk arm, head per surface, they all move together can only read from one head at a time three movements required: "seek" arm to desired track: varies w/ # tracks, 1 to 15 ms wait for disk to "rotate" desired sector under head: 0 to 8.3ms read bits as they rotate under head at 7200 disk performance? big multi-track sequential transfers: one track per rotation 512*1000 / 0.0083 62 megabytes/second that is fast! unlikely to be a bottleneck for most systems small transfers, e.g. 5000-byte web page: from *random* disk location seek + rotate + transfer avg seek: 9 ms avg rotate: 1/2 full rotation for random block transfer: size / 62 MB/sec 9 + 4 + 0.1 = 13.1ms rate = 5000 / 0.0131 = 0.4 MB/sec i.e. 1% of big sequential rate. this is slow. sadly this is typical of real life Lesson: lay your data out sequentially on disk! caching disk blocks use some of RAM to remember recently read disk blocks this is often the main purpose of RAM... your o/s kernel does this for you table: BN DATA ... ... read(bn): if block in cache, use it else: evict some block read bn from disk put bn, block into cache hit cost: about 0 s to serve a small file from RAM miss cost: 0.010 to read a small file from disk eviction / replacement policies important: don't want to evict something that's about to be used! least-recently-used (LRU) usually works well if it's been used recently, will be used again soon LRU bad for huge sequential data that doesn't fit if you read it over and over (or even only once) if it's been used recently, won't be used again for a while! don't want to evict other useful stuff from cache random? MRU? how to decide if caching is likely to work? productive to think about working set size vs cache size you have 1 GB of data on disk and 0.1 GB of RAM will that work out well? maybe yes: if small subset used a lot (popular files) if users focus on just some at a time (only actively logged in users) if "hit" time << "miss" e.g. disk cache if disk I/O signif fraction of overall time maybe no: if data used only once if more than 0.1 GB read before re-use (i.e. people read whole GB) if people read random blocks, then only 10% hit probability if hit time not much less than miss time e.g. caching results of computation i/o concurrency what if most requests hit but some have to go to disk? and you have lots of requests outstanding [time diagram: short short long short short] we don't want to hold up everything waiting for disk idea: process multiple requests concurrently some can be waiting for disk, others are quicker from cache can use threads note: threads handy here even if only one CPU special case: prefetch (works w/ only one thread) special case: write-behind scheduling sometimes changing order of execution can increase efficiency if you have lots of small disk requests waiting sort by track results: short seeks (1ms instead of 8ms) the bigger the backlog, the smaller the average seek system gets more efficient as load goes up! higher throughput "elevator algorithm" maybe you can sort rotationally also, but that's hard cons: unfair -- increases delay for some if you cannot improve efficiency buy faster CPU/net/disk if you can otherwise buy multiple servers (CPUs + disks) -- but how to use? strict partition of work: easiest users a-m on server 1 (cpu+disk) users n-z on server 2 (cpu+disk) no interaction, so e.g. no need for locks, no races hard if e.g. some operations involve multiple users -- bank xfers perhaps factor out computing from storage front end / back end front end can get data for all needed users from back end works well if CPU-bound or FEs can cache Pragmatics programmer time costs a lot hardware is cheap programming most worthwhile if allows you to use more h/w