CRYPTOGRAPHY BASICS AND THE RSA ALGORITHM (edited from the PGP version 2.6.2 User's Manual) There are basically three types of cryptographic systems. 1) No key systems. There is a single method for scrambling messages with a corresponding method for descrambling them. The method has to be kept a tight secret. This is an all or nothing system. Everyone who uses the system has to be trusted. Once the method is leaked, all messages encrypted with this system are exposed. Obviously such a method is unsuitable for widespread use. The one time pad (OTP) can be considered as a modern example of such a method. OTP is provably the most secure cryptographic system. Its drawback is that is difficult to securely set up in advance, and it can only be used once. 2) Single key systems (aka "conventional" or "secret" key systems). The scrambling method requires a key. The descrambling method also requires this *same* key. The method can be made public, only the keys have to be kept secret. This is much easier logistically than no key systems. However exchanging keys in a secure fashion is still a problem. The Data Encryption Standard (DES) is the most widespread example of this kind of system. 3) Two key systems (aka public key systems). The scrambling method requires an encryption key (the public key). The descrambling method requires an entirely different decryption key (the secret key). This simplifies the logistics of using such a system considerably: the public keys can be broadcast all over the world, only the secret keys have to be kept secret. The Diffie-Hellman algorithm is an example of this type of system, used primarily to distribute secret keys for a secret key system. The RSA algorithm is currently the most widely used example of public key system. The RSA algorithm (named after its discoverers Rivest, Shamir, Adleman) is based upon the following result of the famous 18th century Swiss mathematician Leonhard Euler: EULER's THEOREM: Let n be the product of two primes p and q. Let e and d be two numbers whose product ed leaves a remainder of 1 when divided by (p-1)(q-1). Let m be any positive number smaller than both p and q. Then the power m^(ed) = (m^e)^d, when divided by n leaves a remainder of m . (The above is actually a special case of Euler's theorem, but sufficient to understand the RSA algorithm.) Example 1. Let p = 3, q = 5. Then n = 15. Let e = 11 and d = 3. Then ed = 33, which when divided by (p-1)(q-1) = 2*4 = 8 gives a quotient of 4 and a remainder of 1. Now take m = 2. Then 2^33 = 8589934592. When divided by 15 this gives a quotient of 572662306 and a remainder of 2, as predicted by Euler's theorem. To apply Euler's theorem to cryptography we need another fact from elementary algebra: when computing the remainder of an expression involving powers, products, sums and differences, one can compute the remainders at each intermediate stage. For example to compute the remainder of 14*32*12*17 when divided by 15, instead of multiplying out all four numbers to obtain 91392, then dividing by 15 to obtain the remainder of 12, one can proceed as follows: 14*32 = 448 which leaves a remainder of 13 when divided by 15 12*17 = 204 which leaves a remainder of 9 when divided by 15 Now multiply together the remainders obtained in the preceding steps: 13*9 = 102 and divide by 15. We obtain the same remainder of 12. Noting that m^(ed) = (m^e)^d, the remainder of m^(ed) by n in Euler's theorem can be computed first taking the remainder of the expression m^e , then taking this remainder and raising it to the d-th power, then taking the remainder again. In Example 1 we would compute as follows: 2^33 = (2^11)^3. Now 2^11 = 2048, which leaves a quotient of 136 and a remainder of 8 when divided by 15, and 8^3 = 512, which leaves a quotient of 34 and a remainder of 2. In 1978 Rivest, Shamir, and Adleman proposed the following public key system based on Euler's theorem: Suppose Bob wants Alice to send him a secret message. Bob begins by randomly choosing two large primes p and q. He then computes two numbers e and d such that their product ed leaves a remainder of 1 when divided by (p-1)*(q-1). (This is easy to do using an algorithm published by Euclid around 300 BC.) Bob then sends Alice the following information: n = pq and e. He keeps all the other information (p, q, and d) secret. When Alice sends Bob her message, she first represents her message as a sequence of numbers m1, m2, m3, ... each of which is smaller than both p and q. (Alice doesn't need to know p and q to insure this - all she needs is a rough idea how large p and q are, eg. how many digits long.) Then she computes the remainders of (m1)^e, (m2)^e, (m3)^e, ... upon division by n = pq. Alice then sends Bob the remainders r1, r2, r3, ... she obtained. Bob now computes the remainders of (r1)^d, (r2)^d, (r3)^d, .... By Euler's theorem he obtains Alice's original message m1, m2, m3, ... Now suppose Darth was eavesdropping on the entire exchange between Bob and Alice. Then Darth will have intercepted the following information: n = pq, e, as well as the sequence r1, r2, r3, .... This does him no good, since he is missing one critical piece of information: the decryption exponent d. It can be shown that the problem of determining d given n = pq and e, is equivalent to the problem of factoring n into its two individual prime factors p and q. If p and q exceed 200 digits in length, no computer on earth will be able to do the factorization, using present day factoring algorithms. The pair of numbers (n, e) is frequently referred to as the RSA public key, while the pair (n,d) is referred to as the RSA secret key. (The choice of which of e and d to make public and which to keep secret is theoretically arbitrary. Usually the smaller of the two is the one made public and larger one is kept secret, since it will be harder to guess.) Example 2: Suppose Bob chooses p = 29, q = 31. Then n = 899. Then Bob computes that e = 47 and d = 143 have product ed = 6721, which when divided by (p-1)(q-1) = 840 gives a quotient of 8 and a remainder of 1. Bob sends Alice n = 899 as well as e = 47. Now suppose Alice wants to send the message "HI BOB" (without quotes). She represents her message numerically using the following representation: A->2, B->3, C-> 4, ... , Z->27, [space]->28. (Note that all these numbers are less than both p and q. Also note that we avoid representing anything by the number 1, since raising 1 to any power always gives 1.) So the message "HI BOB" is represented by the sequence 9, 10, 28, 3, 16, 3. Alice then does the following computations: 9^47 = 706965049015104706497203195837614914543357369 leaves a remainder of 701 upon division by 899 10^47 = 100000000000000000000000000000000000000000000000 leaves a remainder of 224 upon division by 899 28^47 = 103855015069309411279457362689242221673008953427829542789964325978112 leaves a remainder of 753 upon division by 899 3^47 = 26588814358957503287787 leaves a remainder of 859 upon division by 899 16^47 = 392318858461667547739736838950479151006397215279002157056 leaves a remainder of 690 upon division by 899 3^47 = 26588814358957503287787 leaves a remainder of 859 upon division by 899. Alice now sends Bob the sequence of remainders 701, 224, 753, 859, 690, 859. He decodes the message by raising this sequence to the decryption exponent d = 143, then taking remainders upon division by 899. Noting that 143 = 11*13, we use the shortcut mentioned above of computing the remainder of m^143 = (m^11)^13 by first computing the remainder of m^11, raising the resulting remainder to the 13th power, then taking the remainder again: 701^11 = 20086219191438000565629621957701 gives remainder 506 upon division by 899; 506^13 = 142546552694430788571794496737878016 which gives remainder 9 upon division by 899 (all remaining remainders are upon division by 899 as well) 224^11 = 71240703863716132079796224 gives remainder 113; 113^13 = 489801110321660601428677553 gives remainder 10 753^11 = 44131098528347221402668102018897 gives remainder 231; 231^13 = 5332805430876196086067435141191 gives remainder 28 859^11 = 187899144227561729924334314971459 gives remainder 48; 48^13 = 7180192468708211294208 gives remainder 3 690^11 = 16878739018517842626900000000000 gives remainder 194; 194^13 = 551343792263638925546446659584 gives remainder 16 859^11 = 187899144227561729924334314971459 gives remainder 48; 48^13 = 7180192468708211294208 gives remainder 3 Bob thus obtains the sequence 9, 10, 28, 3, 16, 3 representing Alice's message "HI BOB". Darth knows that n = 899 and e = 47 and also has listened in the message sequence 701, 224, 753, 859, 690, 859. But without knowing that d = 143, he cannot decode the message. The only way he could discover d is by factoring n to its factors p and q (29 and 31). Then he could compute (p-1)(q-1) = 840, and since ed = k*840 + 1, Darth would discover after a few tries that k=8, and thus d = 143. Although factoring 899 is trivial, for large primes p and q this is a very difficult problem. The above example is of course a toy cryptosystem. The primes 29 and 31 are much too small, so 899 can be easily factored. Also the smallness of the primes causes each letter of the message to be encrypted individually, allowing an easy attack by letter frequency analysis. If large primes were used, then one would represent large blocks of letters by a single number instead, and there would be no repetitive patterns in the encrypted message. For example if p = 733 and q = 829, then we could represent pairs of adjacent letters in a message by numbers smaller than p and q according to the following scheme: AA->2, AB-> 3, ..., AZ->27, A[space]->28, BA->29, ..., [space]Z->729, [space][space]->730. (Messages of odd length would be padded by a space.) In real world implementations of RSA, blocks of 16 consecutive characters are represented by a number before RSA keys are applied. RSA AND MESSAGE AUTHENTICATION The RSA algorithm has the nice feature that it can be used for message verification as well as encryption. Example 3: Suppose Alice is Bob's banker and she receives the following message by email from Bob: "GIVE TOM TWO GRAND". Being a conscientious banker, Alice might be worried if the message really came from Bob, or if it possibly might have been written by Tom with the return address forged to look like it came from Bob. How can Bob convince Alice that it is really he who wants to give the money to Tom? The answer is simple: Bob can send her the message encrypted with his secret key! Using the same setup as in Example 2, "GIVE TOM TWO GRAND" converts to the numerical sequence 8, 10, 23, 6, 28, 21, 16, 14, 28, 8, 19, 2, 15, 5. Bob raises each number in the sequence to his secret decryption exponent 143, then takes the remainder upon division by 899. He obtains the sequence 512, 14, 480, 274, 144, 358, 500, 18, 144, 512, 102, 8, 678, 676, which he sends on to Alice. Alice then raises this sequence to Bob's public encryption exponent 47 and takes remainders upon division by 899. She obtains the sequence 8, 10, 23, 6, 28, 21, 16, 14, 28, 8, 19, 2, 15, 5 which corresponds to "GIVE TOM TWO GRAND". Now Alice can be sure that the message came from Bob and no one else. Only Bob's secret key (899, 143) could have produced a message which his public key (899, 47) deciphers into anything intelligible. (Note that this use of RSA is not really encryption. Darth can also recover Bob's message from the sequence 512, 14, 480, 274, 144, 358, 500, 18, 144, 512, 102, 8, 678, 676. How can you ensure authentication and at the same time prevent Darth from reading the message?) You may have noticed that the computations involved in implementing the RSA algorithm are rather large, even when the primes chosen are unrealistically small. If you choose primes sufficiently large to be secure, the resulting system is too slow to be practical for encrypting all but the smallest messages. Medium to large sized messages could take hours to days to encrypt/decrypt on personal computers typically available today. To get around this limitation, public key systems use RSA indirectly: the message is encrypted using a a much faster conventional single key system (eg. DES), then the conventional key for the message is encrypted with RSA. (In systems like PGP the conventional message key is randomly and automatically generated out of the user's sight.) This has some other advantages. A message can be encrypted to several different recipients: the same conventional key is used, with one RSA encrypted header block added for each recipient. For the same reason RSA in authentication mode is not applied to the whole message as in Example 3. Rather a complicated, difficult to forge checksum (or "fingerprint") is computed for the message. The RSA secret key is applied only to this checksum. The recipient of the message computes the checksum for herself, deciphers the attached RSA enciphered checksum and compares the two.