- From: Bennet Yee <bsy@cs.ucsd.edu>
- Date: Mon, 01 Jul 1996 16:36:14 -0700
- To: "Paul C. Kocher" <pck@best.com>
- Cc: ietf-tls@w3.org, pck@cryptography.com
Hi Paul, I wanted to clarify a technical point for the list. Paul's reply to my point: > bsy's point: > > The second is slightly subtler. For oft retransmitted data, > > re-encrypting them under different keys, especially weak, 40-bit keys, > > for many transmissions provides attackers with more partial > > information about the plaintext. If there is only one encrypted > > version of the data that is always re-sent, less partial information > > about the data is leaked. > > TLS/SSL and other cryptosystems using 40-bit keys always include > a nonce or salt with the key to prevent birthday-paradox style > attacks against messages encrypted with more than one key. > > While having multiple copies of the plaintext encrypted under > different keys could theoretically help for some kinds of > cryptanalytic attack, this sort of data could be collected > using a chosen plaintext attack as well (which the protocol > must be able to resist). The point that I was trying to make is a generic one that applies to all ciphers, regardless of the use of IVs. It's also why I made a reference in the next paragraph: > > ... (The German nursery rhyme weakness. > > [Enigma cryptanalysis history.]) The idea that I'm trying to get across is related to the idea of unicity distance from classical cryptography. Unicity distance for a cipher relative to a data source is defined as the number of (symbols/bytes) of ciphertext that an attacker must intercept in order to determine the key (and plaintext) in an information-theoretic sense. It's called "unicity distance" because it's the point passed which where there would be a single interpretation of the messages -- there'd be only one possible value for the key -- whereas before that multiple keys are possible. You can find a definition of unicity distance in most cryptography texts. Some examples: For known-plaintext scenarios, the amount of ciphertext for DES is basically one block or eight bytes. You know the input and output to DES, and that (should) information-theoretically completely determine the key, which gives you any other plaintext encrypted with that key. I mumbled "should", because we don't really know for sure that there exists only one key which gives the particular input-block-to-output-block mapping. For known-plaintext scenarios with a stream cipher, it's basically the number of output symbols which information-theoretically allows you to completely determine the internal state of the stream cipher (call this number N). With me so far? Okay. I said earlier that the idea of unicity distance is relative to a source. What this means is that the unicity distance depends on what you know about the source -- the generator for the plaintext. In the known-plaintext case, we had complete knowledge of the source, and we had a relatively easy time figuring out (roughly) what the unicity distance is. What happens if the source is completely random? If the source is truly random, the unicity distance is (typically) infinite. This is obvious, since given the ciphertext any key could have been used to encrypt plaintext that produces the ciphertext. I mumbled "typically", since you -could- have a cipher that isn't globally one-to-one -- that is, the family of encryption functions (selected by the key) is one-to-one, but the range set are not identical for all of these functions, so the family of encryption functions is viewed as having a range that's the union of the ranges of the individual encryption functions, and range coverage might tell you the key after an enormous number of messages. (This is a weird case that people don't typically worry about, but I want to be careful w/ my informal definitions here.) Now, if the source is English text or an HTML document, the unicity distance is (obviously) something between 8 bytes and an infinite number of bytes for DES (according to Schneier, for English it's 8.2 bytes; according to Meyer and Matyas, it's 14.6 bytes [probably different models of "English"]), and between N and infinity for a stream cipher. In practice, we don't worry -too- much about unicity distances, since it's an information theoretic measure, and we care more about the amount of resources required to break a cipher (time & space) -- this is the more modern cryptography approach, esp with scalable (public key) ciphers. Note, however, that the idea of unicity distance makes intuitive sense -- the more you use your cipher system with a fixed key, the more partial information you're going to leak. And eventually, a powerful adversary may be able to make use of that partial information to figure out your key. (Compare this with cryptanalytic attacks such as Differential Cryptanalysis, where even though you know the plaintext, all you care about is the xor sum of pairs of them -- which is critical knowledge about the source that's used.) Now, back to the point. I argued that if we had a fixed message, retransmitting it repeatedly using different keys (and IVs -- this idea is indepent of IVs and birthday attacks), we leak partial information about the data every time we retransmit it. This is in some ways the dual of the idea of unicity distance -- and since I haven't seen it defined elsewhere I'll call it "multiplicity distance": given a secret message (vs secret key in unicity distance) of some fixed length, the multiplicity distance of a cipher is the number of re-encryptions of that message under different keys/IVs that an adversary requires to information theoretically determine the text of the message. We don't bother with a model of the source of keying material, since keys are supposed to be truly random (though in -practice-....) This model can, of course, be enriched to have the multiplicity distance be a function of the entropy of the random data source as well (e.g., truly random, random English-/HTML-generators etc). So. What's the multiplicity distance (with an HTML source) for RC4? For DES? I don't know if anybody in the public sector knows. I don't. If we can avoid re-encrypting data, however, we should. -This- is my argument why pre-encryption helps security. There. Oh, yeah. The German nursery rhyme reference. That was a reference to the breaking of the Enigma system used in WWII -- the operators would often send each other nursery rhymes before the actual message to make sure that the two machines were synchronized w/ the appropriate (daily or whatever) key. This was repeated plaintext chosen randomly from the space of popular German nursery rhymes, and aided the Allies in the cryptanalysis of the actual messages. We think/hope that RC4 is stronger than Enigma, of course, but.... -bsy p.s. Maybe it's the lengthy tomes that I tend to write, but the mailing list software does not seem to be forwarding my messages to the list (conspiracy theories, anyone?:). In any case, until I hear that the problem's been fixed, I guess I'll periodically resubmit this tract until I get a copy myself. -------- Bennet S. Yee Phone: +1 619 534 4614 Email: bsy@cs.ucsd.edu Web: http://www-cse.ucsd.edu/users/bsy/ USPS: Dept of Comp Sci and Eng, 0114, UC San Diego, La Jolla, CA 92093-0114
Received on Monday, 1 July 1996 19:36:23 UTC