Re: IETF mtg discussion comments from Bennet Yee on 1996-07-01 (ietf-tls@w3.org from July to September 1996)

From: Bennet Yee <bsy@cs.ucsd.edu>
Date: Mon, 01 Jul 1996 16:36:14 -0700
To: "Paul C. Kocher" <pck@best.com>
Cc: ietf-tls@w3.org, pck@cryptography.com
Message-Id: <199607012336.QAA03987@work.ucsd.edu>
Hi Paul,

I wanted to clarify a technical point for the list.

Paul's reply to my point:
> bsy's point:
> > The second is slightly subtler.  For oft retransmitted data,
> > re-encrypting them under different keys, especially weak, 40-bit keys,
> > for many transmissions provides attackers with more partial
> > information about the plaintext.  If there is only one encrypted
> > version of the data that is always re-sent, less partial information
> > about the data is leaked.
> 
> TLS/SSL and other cryptosystems using 40-bit keys always include
> a nonce or salt with the key to prevent birthday-paradox style 
> attacks against messages encrypted with more than one key.
> 
> While having multiple copies of the plaintext encrypted under
> different keys could theoretically help for some kinds of 
> cryptanalytic attack, this sort of data could be collected
> using a chosen plaintext attack as well (which the protocol
> must be able to resist).

The point that I was trying to make is a generic one that applies to
all ciphers, regardless of the use of IVs.  It's also why I made a
reference in the next paragraph:

> > ... (The German nursery rhyme weakness.
> > [Enigma cryptanalysis history.])

The idea that I'm trying to get across is related to the idea of
unicity distance from classical cryptography.  Unicity distance for a
cipher relative to a data source is defined as the number of
(symbols/bytes) of ciphertext that an attacker must intercept in order
to determine the key (and plaintext) in an information-theoretic
sense.  It's called "unicity distance" because it's the point passed
which where there would be a single interpretation of the messages --
there'd be only one possible value for the key -- whereas before that
multiple keys are possible.  You can find a definition of unicity
distance in most cryptography texts.  Some examples:

For known-plaintext scenarios, the amount of ciphertext for DES is
basically one block or eight bytes.  You know the input and output to
DES, and that (should) information-theoretically completely determine
the key, which gives you any other plaintext encrypted with that key.
I mumbled "should", because we don't really know for sure that there
exists only one key which gives the particular
input-block-to-output-block mapping.

For known-plaintext scenarios with a stream cipher, it's basically the
number of output symbols which information-theoretically allows you to
completely determine the internal state of the stream cipher (call
this number N).

With me so far?

Okay.  I said earlier that the idea of unicity distance is relative to
a source.  What this means is that the unicity distance depends on
what you know about the source -- the generator for the plaintext.  In
the known-plaintext case, we had complete knowledge of the source, and
we had a relatively easy time figuring out (roughly) what the unicity
distance is.  What happens if the source is completely random?

If the source is truly random, the unicity distance is (typically)
infinite.  This is obvious, since given the ciphertext any key could
have been used to encrypt plaintext that produces the ciphertext.  I
mumbled "typically", since you -could- have a cipher that isn't
globally one-to-one -- that is, the family of encryption functions
(selected by the key) is one-to-one, but the range set are not
identical for all of these functions, so the family of encryption
functions is viewed as having a range that's the union of the ranges
of the individual encryption functions, and range coverage might tell
you the key after an enormous number of messages.  (This is a weird
case that people don't typically worry about, but I want to be careful
w/ my informal definitions here.)

Now, if the source is English text or an HTML document, the unicity
distance is (obviously) something between 8 bytes and an infinite
number of bytes for DES (according to Schneier, for English it's 8.2
bytes; according to Meyer and Matyas, it's 14.6 bytes [probably
different models of "English"]), and between N and infinity for a
stream cipher.

In practice, we don't worry -too- much about unicity distances, since
it's an information theoretic measure, and we care more about the
amount of resources required to break a cipher (time & space) -- this
is the more modern cryptography approach, esp with scalable (public
key) ciphers.

Note, however, that the idea of unicity distance makes intuitive sense
-- the more you use your cipher system with a fixed key, the more
partial information you're going to leak.  And eventually, a powerful
adversary may be able to make use of that partial information to
figure out your key.  (Compare this with cryptanalytic attacks such as
Differential Cryptanalysis, where even though you know the plaintext,
all you care about is the xor sum of pairs of them -- which is
critical knowledge about the source that's used.)

Now, back to the point.  I argued that if we had a fixed message,
retransmitting it repeatedly using different keys (and IVs -- this
idea is indepent of IVs and birthday attacks), we leak partial
information about the data every time we retransmit it.  This is in
some ways the dual of the idea of unicity distance -- and since I
haven't seen it defined elsewhere I'll call it "multiplicity
distance": given a secret message (vs secret key in unicity distance)
of some fixed length, the multiplicity distance of a cipher is the
number of re-encryptions of that message under different keys/IVs that
an adversary requires to information theoretically determine the text
of the message.  We don't bother with a model of the source of keying
material, since keys are supposed to be truly random (though in
-practice-....)  This model can, of course, be enriched to have the
multiplicity distance be a function of the entropy of the random data
source as well (e.g., truly random, random English-/HTML-generators
etc).

So.  What's the multiplicity distance (with an HTML source) for RC4?
For DES?  I don't know if anybody in the public sector knows.  I
don't.  If we can avoid re-encrypting data, however, we should.

-This- is my argument why pre-encryption helps security.

There.

Oh, yeah.  The German nursery rhyme reference.  That was a reference
to the breaking of the Enigma system used in WWII -- the operators
would often send each other nursery rhymes before the actual message
to make sure that the two machines were synchronized w/ the
appropriate (daily or whatever) key.  This was repeated plaintext
chosen randomly from the space of popular German nursery rhymes, and
aided the Allies in the cryptanalysis of the actual messages.  We
think/hope that RC4 is stronger than Enigma, of course, but....

-bsy

p.s.  Maybe it's the lengthy tomes that I tend to write, but the
mailing list software does not seem to be forwarding my messages to
the list (conspiracy theories, anyone?:).  In any case, until I hear
that the problem's been fixed, I guess I'll periodically resubmit this
tract until I get a copy myself.

--------
Bennet S. Yee		Phone: +1 619 534 4614	    Email: bsy@cs.ucsd.edu

Web:	http://www-cse.ucsd.edu/users/bsy/
USPS:	Dept of Comp Sci and Eng, 0114, UC San Diego, La Jolla, CA 92093-0114
Received on Monday, 1 July 1996 19:36:23 UTC