# -*- mode: org -*- #+STARTUP: indent Frans Kaashoek 6.033 2011 Lecture 4: Naming didn't cover virtual memory because of valentine interrupts. * Slides from last lecture How does client identify server? Using a name. In the example, the name is a Domain Name System (DNS) name Name systems are interesting systems in themselves The glue that connects everything together * Today: naming systems Typical goals Abstract view of naming systems Questions to ask about naming systems DNS (is itself structured in a client/server organization) Virtual memory File system in tomorrow's hands-on and section DP1 is designing a naming system * Names are common (because they are so powerful) LD R0, 0x2020 18.7.22.69 web.mit.edu /mit/6.033/www http://web.mit.edu/6.033 6.033-staff@mit.edu dialup foo.c .. (as in cd .. or ls ..) wc int x = 0 Frans Kaashoek (617)253-7149 021-82-2030 * Typical goals of a naming system 1. user friendly 2. sharing diagram: A and B both know N, refers to ... 3. retrieval (sharing across time) A knows N, refers twice to ... over time 4. indirection diagram: A knows fixed N, refers to variable N', refers to ... 5. hiding hide implementation control access (access only if you know name, or hook to hand ACLs) * Abstract picture of naming ** maps names to values (often values are names in another naming system) lookup(name) -> value ** some names are local names, some are local names lookup(name, context) -> value ** context often named by a name allows recursive names/pathnames (e.g, /mit names context for 6.033) ** programming language theory has developed this into a more deeper level but this simple model fine for many computer systems * Questions one can ask with abstract picture? What is the syntax of a name? What are the values? What is the lookup algorithm? Where does the name of a context come from? * Questions one can ask about the lookup algorithm? Is the mapping from name to value unique? Can a name have multiple values? Can a value have multiple names? (synonyms) Can a name have no value? * DNS Example what does web.mit.edu name? you can look it up and turn it into an IP address (e.g., 18.7.22.69) Internet knows how to send data to IP addresses (talk about it later) An address is a name with location how does the lookup algorithm work? * The big picture Simplified explanation; more detail in section after spring break DNS has a client/server organization ** my laptop O/S shipped with IP addrs of a few "root name servers" (named by .) these are hard-wired, in a file, about a dozen (. is global name) 198.41.0.4 (virginia) 128.63.2.53 (maryland) ... this list doesn't change much, otherwise old computers would break ** my laptop picks a root server, sends it a query, asking for IP address of the string "web.mit.edu" ** each root server has exactly the same table, the root "zone" edu NS 192.5.6.30 edu NS 192.12.94.30 com NS 192.48.79.30 ... ** finds entries in root zone that matches suffix of "web.mit.edu" returns those records to my laptop ** now my laptop knows who to ask about edu re-sends request to one of the edu servers each edu server has a copy of the edu zone nyu.edu NS 128.122.253.83 mit.edu NS 18.72.0.3 mit.edu NS 18.70.0.160 returns the two MIT records to my laptop now my laptop forwards request to an MIT server each MIT server has a copy of the mit.edu zone web.mit.edu A 18.7.22.69 bitsy.mit.edu A 18.72.0.3 csail.mit.edu NS 18.24.0.120 ... ** DNS has a hierarchical name space diagram of a tree ** Where do the zone files come from? each zone centrally but independently administered root by IANA, an international non-profit org com by VeriSign, a company MIT IS maintains its zone files, updates them on its servers ** Why this design? everyone sees same name space (at least for full names) simplifies sharing scalable in performance via simplicity, root servers can answer many requests via caching, not really described here via delegation (keeps table size down) contrast to single central server scalable in management MIT control own names by running its own servers fault tolerant ** There are problems: who should control root zone? .com? load on root servers denial-of-service attacks security problems: how does client know data is correct? how does verisign know i'm ibm.com when I ask it to change NS? ** DNS has been very successful 25 years old design still works after 1000x scaling you can learn more about DNS from Chapter 4 of the notes + hands-on ** How do systems use DNS? user-friendly naming (type a URL) web page tells browser what to do on button press i.e. connects one page to the next Adobe PDF Reader contains DNS name for where (later) to check for upgrades allows services to change IP address of servers (change hosting service) DNS servers can give different answers to different clients turn www.yahoo.com into IP address near client * DNS demo script ping www.mit.edu ping web.mit.edu ping www ping www. dig www.mit.edu dig . NS dig @a.root-servers.net edu NS edu dig @b.root-servers.net edu NS edu dig @f.edu-servers.net mit NS mit.edu dig @f.edu-servers.net mit www.mit.edu * Example: virtual addressing often called "virtual memory" or "paging" ** Why this example? naming mechanism well designed solves many problems basis for many neat o/s features (next lecture) ** Memory this is review from 6.004 also Chapter 5 in notes diagram: CPU, address/data wires, DRAM DRAM an array of bytes indexed with "physical address" let us say 32-bit addresses could have up to 4 GB of DRAM example: LD R4, 0x2020 ** Why physical addressing not sufficient? many reasons, as we will see here's one: what if program too big to fit in DRAM? but still fits in 2^32 bytes e.g. two-megabyte program, only one megabyte of DRAM ** Demand Paging diagram: cpu, dram, disk want some data to be in memory, some on disk when program refers to data that's not in memory, move something else from memory to disk, load in the desired memory from disk called "demand paging" that's the original motivation, not so relevant any more ** Virtual Addressing Idea diagram: MMU, two arrays of memory cells set of arrows from left to right "virtual addresses" (names) vs "physical addresses" (values) s/w ONLY loads/stores using virtual addresses phys addrs ONLY show up in the mapping table conceptually, translations for all 2^32 (lookup using table) some refer to physical memory others to disk locations how to implement? if per-byte mappings, 16 GB! not practical to have a translation table with entry per byte addr ** Page Tables how to make the mapping table small? divide virtual addresses into contiguous aligned "pages" upper (say) 20 bits are "virtual page number" MMU only maps page number -> upper 20 bits of a phys address then use lower 12 va bits as lower 12 pa bits ** Example: page table: 0: 3 (i.e. phys page 3) 1: 0 2: 4 3: 5 register R3 contains 0x2020 LD R4, (R3) CPU sends (R3)=0x2020 to MMU virtual page number is 2 offset is 0x20 thus pa = 4*4096 + 0x20 = 0x4020 MMU sends 0x4020 on wires to DRAM system ** Page table small enough to fit in DRAM has 2^20 entries, probably 4 bytes each now page table is only 4 MB, not so big ** Where is the page table? in memory so can be modified with ordinary instructions CPU Page Table Base Register points to base page table named by a physical address so a memory reference conceptually requires *two* phys mem fetches one to get the page table entry one to get the actual data allows us to keep multiple page tables, quickly switch will come in handy later... ** Flags "Page Table Entry" 20 bits, plus some flags Valid/Invalid, Writeable/ReadOnly MMU only does translation if valid and correct read/write otherwise forces a "page fault": save state transfer to known "handler" function in operating system ** Demand paging page table: 0: 1, V 1: -, I 2: 0, V and only two pages of physical memory! if program uses va 0x0xxx or 0x2xxx, MMU translates fault if program uses va 0x1xxx handler(vpn): (ppn, vpn') = pick a resident page to evict write dram[ppn] -> disk[vpn'] pagetable[vpn'] = -, I read dram[ppn] <- disk[vpn] pagetable[vpn] = ppn, V ** Demand paging discussion picking the page to evict requires cleverness e.g. least recently used we are treating DRAM as a cache for "real" memory on disk can do that b/c programs use "names" (va) for memory we have full control over what the names mean map to DRAM or arbitrary action in page fault handler ** What do operating systems use page tables for? Demand paging Lazy loading of big programs Zero fill Address spaces for modularity, defend against bugs (next lecture) one page table per running program one program cannot even name another program's memory! Shared memory for fast communication two different names for same physical page tight control over what's shared Copy-on-write fork Distributed shared memory * Closing keep your eyes out for naming systems be aware in your own work (e.g., DP1) when introducing naming might help