Message: 1 From: "Mojo Eddie" To: mojonation-users@lists.sourceforge.net Date: Tue, 26 Jun 2001 21:01:49 -0000 Subject: [Mojonation-users] Feedback for the Market and a Market for Feedback Reply-To: mojonation-users@lists.sourceforge.net In order for the system to operate at maximum efficiency, blockservers will increase their coverage in order to maintain high handicap ratings from brokers. But in order to know when they've reached the sweet spot where they are losing more in handling fewer profitable blocks than they are gaining by improving their reputation, they've got to have some kind of feedback mechanism which lets them know how they're doing. Servers will need to balance their coverage against their reputation among the brokers. They can tell what their effective coverage is by themselves just by keeping track of the proportion of queries they receive that they aren't able to fulfill. But they have no way to know what kind of handicaps their level of coverage is incurring among the brokers. Furthermore, they have no way to know what the handicaps of other servers are so that they can tell if they are being sufficiently competitive. The only two measures they have are the number of queries and requests they are receiving and the prices they are being offered. While they could watch for any increases or decreases in traffic and prices, they have no way of knowing if these changes are because of changes in their reputations among brokers or due to other external factors, like changes in the total use of Mojo Nation as a whole. I propose that metatrackers serve as server rating repositories. To a degree they already are, in that the each server advertises their prices to the metatrackers and brokers learn them when they query the metatrackers to find servers (although that pricing mechanism is now obsolete thanks to dynamic pricing). I propose that the metatrackers also collect server handicap information from the brokers and make it available to the servers. This way the servers will be able to get dynamic feedback on the results of their changes in behavior. It could work like this. Whenever a metatracker receives a 'list servers' query from a broker, it in turn generates a 'rate servers' request back to the broker (this is in addition to the normal response to the 'list servers' request). The metatracker includes a Mojo offer along with the request. If the broker decides to accept the offer, it responds with a list of all the servers it knows about, ranked according to its current handicapping scheme. It would actually return multiple lists of servers, one list for each class of server (blockserver, content tracker, relay server, etc.). Each list for each server class would be sorted independently, ranked using the handicapping function appropriate for each server class. Periodically, a blockserver can send a 'dump server ratings' request to any metatrackers it knows about, along with a Mojo offer. If the metatracker accepts the offer, it replies with all the ratings it has collected from brokers. The blockservers can then use this information to find out if their reputations have gone up or down after having made changes in the quality of service they provide. The ratings dump could be detailed or summarized, and they could include all servers the metatracker has data for or just the server making the request. If it includes all servers, the servers will have a better understanding of the entire marketplace. If it is detailed, it allows the blockservers to perform more detailed analysis and be more efficient in their response to changing conditions. Brokers can be expected to charge for providing their handicap data to the metatrackers, because although doing so improves the marketplace efficiency and is therefore in their long-term interest, they derive no short-term benefit from it but do have to consume resources to provide it (cpu and bandwidth). Servers can be expected to pay for obtaining the data because it is in their interest, both short-term and long-term, to obtain feedback to improve their reputations. Metatrackers can serve as natural middlemen, so that blockservers don't have to try to contact every individual broker (they couldn't anyway, because what they are most interested in learning about is the number of brokers that aren't even contacting them anymore because their reputations are so bad). As middlemen, they'll collect from the servers and pay the brokers, and may even be able to turn a profit (although that itself will be subject to market competition among the metatrackers). This system will help blockservers find the magic number for coverage which best maximizes their profit (which really just means most efficiently allocates disk usage and query traffic among all the blockservers). The same system could be used to help all other classes of servers improve their responsiveness and attractiveness to the brokers for the services they provide. - Mojo Eddie _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. --__--__-- Message: 2 From: "Mojo Eddie" To: mojonation-users@lists.sourceforge.net Date: Tue, 26 Jun 2001 21:03:34 -0000 Subject: [Mojonation-users] The Quest for Infinite Scalability Reply-To: mojonation-users@lists.sourceforge.net Mojo Nation as it is currently implemented has a few limits to scalability. It's pretty darn scalable as it is, but we can do better. I'd like to outline some ideas on how. The first chokepoint to growth is that there are only a few metatrackers. As the number of users grows, the number of brokers making queries to the metatrackers will eventually exceed the bandwidth and cpu that a small set of metatrackers can devot to answering queries. Naturally, the Evil Geniuses are already working on this by decentralizing the metatrackers and allowing everyone to run one. The next chokepoint is the capacity of each individual blockserver. As the number of files stored by Mojo Nation grows, each blockserver will have more blocks to store, eventually exceeding their available disk space. This problem is already mostly solved by partitioning the blockspace into bitmasks, so that each server only needs to store a certain percentage of the total number of blocks. But to be scalable, we need something else; the bitmasks need to be dynamically and automatically adjustable, so that as more blocks are added to a server with a given bitmask, it lengthens its bitmask as it starts running out of space in its blockstore. The next chokepoint is the memory and cpu of the metatrackers. As the number of blockservers grows, the metatrackers will have to consume more memory storing the complete list of all blockservers, and will consume more cpu answering queries from brokers as the size of their databases grows. Simply having multiple metatrackers won't solve the problem by itself. We also need a way to partition the work of the metatrackers in the same way that we partition the work of the blockservers. I propose that metatrackers be given a bitmask just as blockservers are and that metatrackers advertise their bitmasks the same way that blockservers do, which is to say by publishing them on the metatrackers. Whereas a blockserver's bitmask advertisement means "I have blocks which fall within this bitmask", a metatracker's bitmask advertisement means "I have a list of blockservers whose bitmasks fall within this bitmask". The bitmasks of metatrackers will be much shorter than the bitmasks of blockservers. Any particular client that is running both a blockserver and a metatracker will have different bitmasks for the two servers, although the one will probably fall within the other. At first, all metatrackers will have a bitmask of zero, meaning that they all know about all blockservers with all bitmasks. As the size of a metatracker's blockserver table get too large, it can dynamically increase its bitmask and so reduce the size of its table. But to make this work, the broker-to-metatracker 'list servers' protocol will need an extra component: the 'metatracker referral' response. When a broker is looking for a block and has no cached information whatsoever, it begins by querying any metatracker it knows about for a list of blockservers whose bitmasks include the block, just like it does now. But if the bitmask of the metatracker it queries doesn't include that block, it won't know about any suitable blockservers. So instead it returns a metatracker referral response. Instead of searching its blockserver table, it searches its metatracker table, and returns a list of all the metatrackers it knows about whose bitmasks include the block in question. The broker can now begin querying those metatrackers for lists of blockservers. Note that the blockserver doesn't begin by querying for a list of metatrackers. It just queries for blockservers, and only receives metatracker referrals instead if the metatracker it queries doesn't have any blockservers that match. Now, this implies that each metatracker knows about all the other metatrackers, so that no matter what block a broker is looking for, it can reply with the list of metatrackers whose bitmasks cover that block. Which means that this is our next scalability chokepoint. What happens when the number of metatrackers grows too large for a single metatracker to maintain them all in its table of metatrackers? We can't partition the work of tracking metatrackers like we can partition the work of tracking blockservers. Each metatracker has to be able to give a referral to another metatracker for any block that it might be asked for, which means it needs to have enough metatrackers in its table to cover the entire blockspace. But we can gain some scalability by implementing a best-effort retention scheme that prefers metatrackers that are closer over those that are farther (where 'closer' and 'farther' refer to how many bits the two metatrackers' bitmasks have in common) while retaining at least a minimal amount of coverage for the entire blockspace, no matter how far away. In other words, as a metatracker's metatracker table gets too large, it begins to drop entries from it. The fewer bits a metatracker's bitmask has in common with the metatracker, the more likely that entry is to be dropped. However, there will always be a certain number of metatrackers that are retained that cover each bitmask, so that the metatracker will always be able to make at least a certain number of referrals for any block. When evaluating which entries to drop for a particular bitmask, the metatracker can use a handicapping function to preferentially keep those with better reliability. Okay, all well and good. But what happens if the Nation gets so large that a metatracker can't even keep one metatracker for every bitmask? There's an easy answer for this one, but first I'd like to point out just how large the system has to be before this becomes a limit to scalability. Let's say that the typical metatracker can maintain adequate performance while storing only a thousand metatracker entries and a thousand blockserver entries. That means that the metatrackers can have 10-bit bitmasks and each one will still be able to keep at least one per bitmask in their table. It also means that the blockservers can have 20-bit bitmasks and still expect to be listed in at least one metatracker. That partitions the blockspace into one million bitmasks, each of which will be contain as much data as a single average blockserver can hold (although it will be replicated onto as many blockservers as there are within a single bitmask). If the average blockserver devotes a gig of disk space to its blockstore (not unreasonable even now), then that's a thousand terabytes of storage in the system. But a table with only a thousand entries is a grossly low estimate of the capabilities of a simple, well-written database, even running on a home computer. It's not off the mark to assume that each metatracker could store a million entries per table without completely falling apart. That raises the numbers to a 20-bit bitmask per metatracker, a 40-bit bitmask per blockserver, and a billion terabytes in the system, minus an order of magnitude or two to account for unavoidable inefficiencies and required redundancies in bitmask allocation. That's pretty big, but we can go even bigger. If the blockspace gets so large and the bitmasks so long that no metatracker can keep track of even one metatracker per bitmask, then they can simply start dropping more entries from their tables. This will mean that there will be portions of the blockspace about which a particular metatracker has no information and therefore can't give a referal to when asked for a blockserver with a block in that blockspace. But this is okay, as long as metatrackers are still dropping metatracker entries while preferentially keeping those that are closest to their own bitmask. When a metatracker is asked for a blockserver in a blockspace that it has neither blockserver entries for nor metatracker entries for, it replies with a referral to the metatrackers that it knows about that are as close as possible to the blockspace being asked for. The broker will then go ask them for a list of blockservers. They won't know any blockservers in that blockspace either, but because they are closer, they are more likely to have a referral to a metatracker that does. They issue that as a referral, and the broker goes and makes another query. This time, the metatracker does have a list of blockservers that cover the block in question, and the broker can set about querying them to find the block. This process can work no matter how large the Nation grows, no matter how many blockservers and metatrackers there are, no matter how long the bitmasks grow, no matter how fragmented the blockspace becomes, and no matter how far apart the blockservers and metatrackers get within the blockspace. The limit to scalability becomes the number of referral hops that a given broker can traverse in order to get a list of blockservers that may hold the block she's looking for, and the number of hops required grows logarithmically as the size of the Nation grows. Now we're talking scalable. In order to ensure that the number of hops only grows logarithmically, metatrackers need to adopt a scheme for retaining metatracker entries similar to the following. A metatracker should keep a constant number of entries of metatrackers in other blockspaces for every blockspace with a bitmask that has a given number of bits in common with the server's bitmask. In other words, if the constant is one (the smallest possible), the metatracker should know of one metatracker in the blockspace that has no bits in common with it, one in the blockspace with one bit in common, one in the blockspace with two bits in common, and so forth, all the way down to knowing about one other metatracker in its own bitmask. If the metatracker has more room in its metatracker table, then it can increase the constant. This keeps the growth of the metatracker tables linear with the depth of the metatrackers' bitmasks, which itself will grow logarithmically with the size of the Nation. What does this do to the number of hops needed to find a list of blockservers containing a particular block? If a broker has no information in her cache and is looking for a block as far away as possible from the one metatracker she knows about, the first query will return a set of metatrackers randomly distributed throughout the half of the blockspace that contains the block she's looking for. Even if we assume that the first metatracker only provides a referal to a single metatracker and that single metatracker is as far away from the eventual destination as possible, it will still be one bit closer. That one metatracker will be able to return at least one other metatracker that is at least one bit closer still, and so on. Each subsequent hop will in the worst case return at least one metatracker that is at least one bit closer, which means that the worst case search has an upper bound that grows linearly with the depth of the blockspace, which itself grows logarithmically with the size of the Nation. But things improve dramatically if each metatracker can afford to keep more than one metatracker entry per bitmask length. The more metatrackers a metatracker provides referals to, the greater the chance that each hop brings the broker more than one bit closer. Suppose, though, that the first metatracker returns only a few referals into the other half of the blockspace, and all the ones it refers to are unavailable? What does the broker do now? She continues making queries to other metatrackers, any metatrackers at all. If each metatracker independently and randomly determines which metatracker entries to drop and which to keep, then by asking more metatrackers for referals the broker will get different referals. Eventually she will get a referal to a metatracker in the other half of the blockspace that is available. Okay, so what if the Nation has become mind-bogglingly immense, and metatrackers can no longer afford to keep even a single entry per bitmask length. What then? Well, they start keeping less than one entry per bitmask length, or rather, one entry per several bitmask lengths. This reduces the efficiency of the search because each hop has a probability of less than one of producing a referal that brings the broker closer by at least one bit. But the efficiency is only reduced by a linear scaling factor, which itself grows only logarthmically as the size of the Nation grows. The dominant scaling factor in search length is still the log of the size of the Nation. Slightly more formally, the expected search time to find any particular block in Mojo Nation using this scheme is on the order of f(alpha) * log(n) , where n is the number of advertised blockserver bitmasks in the system, alpha is the number of metatracker entries per bitmask length each metatracker can store, and f(alpha) is a function that is too complicated for me to work out at the moment, but is the expected number of bits closer to the destination each query will bring the broker for any given alpha (determined by the expected value of a probability distribution obtained by taking the maximum of alpha discrete exponential probability distributions (for alpha greater than or equal to one; it gets even more complicated when alpha is less than one)). Whatever f(alpha) is, I feel confident that it makes a much smaller contribution to the search time than log(n) as n gets arbitrarily large, in part because alpha itself is order 1/log(n). So search times for blocks will grow as order log(n) with the growth of the number of blockserver bitmasks, which means order log(n) with the growth in the amount of data stored in the system, with no bottlenecks in the system until we reach the limit of the number of blocks the system can store, which is currently fixed by the number of bits in the SHA-1 hash to be 2^160. I wouldn't call it infinite scalability, but it's not a bad start. - Mojo Eddie _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. --__--__-- Message: 3 From: "Mojo Eddie" To: mojonation-users@lists.sourceforge.net Date: Tue, 26 Jun 2001 21:05:11 -0000 Subject: [Mojonation-users] More Bitmasks and Better Blocks Reply-To: mojonation-users@lists.sourceforge.net More Bitmasks Everything that I've discussed so far, including scalability and the economics of disk utilization, has depended on the notion of the bitmask advertisement: an announcement by a blockserver (via the metatrackers) that it is storing a certain segment of the total Mojo Nation blockspace. It's been assumed that each blockserver has a particular bitmask which may grow or shrink over time, but nonetheless describes all the blocks that it has (within the limits of its chosen density). But there's nothing that would prevent a blockserver from having more than one bitmask. In fact, there's nothing preventing that from happening right now, except that multiple bitmasks are not completely implemented in the code (although the Evil Geniuses have been working on it in several places). However, even without the code modifications, a server operator could start advertising multiple bitmasks by running multiple instances of the server using the same blockstore (although this would probably screw up file and database locking). What benefit would Mojo Nation gain by incorporating the possibility of multiple bitmasks everywhere? I can only think of one, but it's a pretty good one. It would allow people to publish entire files to their own blockservers. People have been doing this already, but they've had to lower their bitmask to zero to do it. There are good reasons for wanting your own files on your own blockserver. For one thing, you can get a guaranteed availability and retention without having to pay someone else (of course you'll be paying for with your own disk space, but you might consider yourself cheaper and more reliable). For another thing, it lets you publish in real time with zero delay and then simply let block wholesaling take care of getting them widely distributed. The drawback, of course, is that you now have lots of bitmasks you have to start advertising in order for anyone to find your files on your server. It's not really a drawback for you, but it has some bad implications for scalability of the system as a whole. Since search times are order log(n) where n is the number of advertised bitmasks, if all servers have x number of bitmasks to advertise instead of just one, searches become slightly longer by a constant amount on the of log(x) (the search time becomes log(x*blockservers) = log(blockservers) + log(x) ). There's an easy way to offset this negative consequence for the system as a whole, and once again it relies on selfish incentives with unintended positive consequences. Metatrackers can simply charge blockservers per bitmask that they want to advertise. The driving factor behind the scalability I discussed earlier is the number of blockservers that metatrackers can hold in their tables. That makes space in the metatrackers' blockserver tables a scarce commodity, so they should charge for it. Blockservers that think it's important to store and serve their own files can pay the cost for doing so; as long as the Nation is small and the metatrackers have plenty of available overhead, the cost will be small or zero. ----- Better Blocks I think the way that Mojo Nation splits files into pieces, shares, and blocks could be improved in a couple of ways. Every block that makes up a file ends up getting a different block ID (because its ID is just its SHA-1 hash) chosen from a uniform distribution across the entire blockspace. That means that every block will end up being on a different blockserver. This is supposed to increase reliability, because only about half of the blocks are needed to recover the file, and if the blocks are widely dispersed, there's less risk that a single server being down will make the file unrecoverable. It's also supposed to help increase download speeds, because you can theoretically download multiple blocks at once from multiple servers and then assemble them. I don't think it's working as intended. What I've seen happening instead is that there will be a piece of a file which has been distributed to X different servers. Now, only half of those servers need to be available to assemble the piece. But what if there's only (X/2)-1 available? The piece can't be reassembled, and the file can't be downloaded in its entirety. It's not all that unlikely to happen, given that Mojo Nation is made up primarily of computers with intermittent availability (as it is designed to be). There are two factors at work here. The first is that the file is split into pieces before being redundantly encoded. This makes it possible to download a piece at a time, so that the user can begin doing something with the file while waiting for the rest of it to be found and assembled. Presumably this is good for things like sound files, and maybe even pictures, but it's pretty useless for many other types of data. By splitting it into pieces, however, it increases the chances that you'll end up with an incomplete download, because rather than needing any half of all the blocks in the file, you need half of all the blocks in every piece. This increases the chance of failure drastically. Getting part of a file early while you wait for the rest may be good, but getting only part of a file no matter how long you wait is a disaster. The other factor is that blocks from a single file are distributed to different servers. While it means that the loss of a single server won't prevent the file from being available, it also means that the presence of a single server isn't enough to ensure the file is available. Consider the case where a file has been split into eight blocks, any four of which are needed to recover it. Two of those servers are highly available and have plenty of bandwidth. Three are so-so, and three are just plain never there. Has the availability of the file been improved because it's been split up? Not really. Two of the blocks will show up right away, and depending on the weather, the other two you need may show up at some point. If it's a bad day, it may be a long wait before you get your four blocks. Now, if either of the two fast servers had not just one but two of the eight blocks, you'd be done in a second without having to wait for the slowpokes. But this is not likely to happen, because blocks are uniformly distributed across the blockspace, and any single server will only have a particular segment of that blockspace. Now that we have block wholesaling, there's no reason to rely on uniform distribution of block hashes across the blockspace to ensure that a single file is on enough servers to provide redundancy. All blocks will propagate to as many servers as it takes to bring their supply in line with their demand. Instead, by preventing blocks from a single file from being stored on the same server, we are requiring that more servers be available in order to reliably obtain the file. I propose two changes to the current method of breaking blocks into files. First, get rid of pieces. Split the entire file up into redundant blocks such that the entire file can be recovered as soon as enough blocks have been found and downloaded. The increase in availability of the entire file will more than offset the lack of getting partial results in the interim. Second, change the block ID of a block to be the SHA-1 hash of the original file, plus as many bits as it takes to count all the blocks the file has been split into. Each block will have a unique identifier, but all the blocks of a given file can be made available from one server (because they'll all fall within the same bitmask). There will be no reason for a single server to have more than half of them, of course, and any server will be able to tell when it's got more blocks for a single file than it needs to have (they'll all have the same first 160 bits, and the number of remaining bits will reveal how many blocks are required to make up the entire file). I think these two steps will remarkably improve both download reliability and download speeds. When multiple servers are available that have the blocks needed, brokers will use as many as possible. But when there are fewer available, or when some of the ones available are heavily loaded, brokers will be able to continue getting the remaining blocks from those servers that have already provided the first ones. - Mojo Eddie _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. --__--__-- Message: 4 From: "Mojo Eddie" To: mojonation-users@lists.sourceforge.net Date: Tue, 26 Jun 2001 21:10:18 -0000 Subject: [Mojonation-users] The Publishing Problem, Part Two Reply-To: mojonation-users@lists.sourceforge.net I'm finally ready to finish talking about publishing, which is where I started this series of notes. A good bit of the disincentive for publishing can be removed by improving the user interface, as I discussed earlier. But there's still the Mojo cost to publishing, and the two-fold nature of publishing: some data will have publishers who want to publish it, some data will have plenty of people who want to obtain it but nobody with an incentive to publish it. The latter case no longer poses a problem, as long as publishing doesn't cost Mojo and the user interface is sufficiently convenient. Once the data gets added to a single server and made available through a single content tracker, if there is any demand for it, blockservers will begin to wholesale it in hopes of generating profitable contention for bandwidth. By eliminating the publishing cost in both Mojo and inconvenience, altruism and/or automation should be enough to provide the content users want to see. But what about publishers who want to make content available that nobody has a demand for, or at least less of a demand than the publisher thinks it should (or might have in the future)? Servers have no incentive to store content that there is no download demand for, so publishers of such content will have to provide their own incentive: payment for storage. The problem is that there is no way for a server to know beforehand whether a publisher's content is going to be in demand or not. The best that they can do is to take all offered new content on spec (as long as they have disk space available) and simply drop it if it doesn't turn a profit within a certain timeframe. The one who can make the best estimation of the likely demand (and thus profitability) of content is its publisher. If he's sure it will be a hit, he can publish it to a server and know that the demand will keep it available. But if he thinks that it's not going to be in demand enough to acheive the propagation he wants, he can offer Mojo to servers in exchange for guaranteed retention periods. There's no good way to actually guarantee retention, but publishers can check up on servers and lower their reputation if they aren't living up to their commitments. They can also start with small periods and continually extend them as long as the servers are actually providing the retention they paid for. In order for servers to be able to guarantee blocks are retained for certain periods, they'll have to set aside parts of their blockstore that are exempt from the normal process of dropping blocks based on profitability. In fact, one could even consider the block storage that has been paid for by publishers as being profitable in and of itself regardless of whether it ever generates any Mojo from downloads. Furthermore, servers can establish what their going rate for guaranteed block retention is per day per byte by calculating what the current profitability of their normal rotating blockstore is per day per byte, and then advertise that as their publishing price (not counting blocks published on spec, which they will always be willing to accept at zero cost but with no promise to retain unless they generate profit). This should produce a system that works in everyone's best interest: publishers, consumers, servers, and those making content available just because they have it. - Mojo Eddie _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. --__--__-- _______________________________________________ Mojonation-users mailing list Mojonation-users@lists.sourceforge.net http://lists.sourceforge.net/lists/listinfo/mojonation-users End of Mojonation-users Digest