Re: Encryption code-set issues (Was Re: WOOPS: xmlenc Call 13:00 EST 20020107)

I accidently sent the original mail to w3c-ietf-xmldsig@w3.org, but the 
reply now goes to xml-encryption@w3.org.

--On Mittwoch, 9. Januar 2002 18:41 -0500 Tom Gindin <tgindin@us.ibm.com> 
wrote:

>       A similar known-plaintext problem, although not as severe, occurs
> with most alternate alphabets.  Briefly, a typical non-Latin alphabet
> (such as Cyrillic, Greek, Arabic, or Hebrew) will have 7 bits or less of
> entropy for every 2 bytes encrypted, so each block in an 8-byte block
> algorithm will have 28 bits of entropy or less out of 64 bits of
> plaintext.  Of course, for AES that will be 56 out of 128.  In these
> cases, which have more practical importance than old italic, UCS-2 is no
> better than UTF-8 and UCS-4 is much worse.  In another case of
> considerable practical importance, UTF-8 does weaken CJK (Chinese,
> Japanese and Korean characters) compared to UCS-2 (typical characters
> have about 14 bits of entropy but take up 3 bytes instead of 2).
>       AFAIK, one major purpose of CBC is to make it almost irrelevant
> whether random data is immediately in front of a given piece of plaintext
> or far in front of it, from the standpoint of cryptanalysis.   I am not an
> authority on crypto, but I do have some comments below, marked by [TG]
>
>             Tom Gindin
>
>
> Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>@w3.org
> on 01/09/2002 10:01:52 AM
>
> To:    reagle@w3.org
> cc:    XML Signature WG <w3c-ietf-xmldsig@w3.org>
>
>> BTW: What do you think of Martin's additional concern about nonces not
>> mitigating redundancy in later in the data (like Unicode italic
>> characters). I thought that with most modern ciphers that if the
>> beginning  is unpredictable that's sufficient for most of the message?
>> (If you have  any thoughts, please respond to the list.)
>
> I'm not an UFT/I18n expert so I did not understand the "Old italic" stuff,
> but I agree with Martin that usage of the Nonce does not increase entropy.
> It would do so if the whole message would be transformed in a single
> transform.
>
> Imagine we would use a construct like a cryptographic Hash function or an
> RSA encryption with a modulus size of 1000000 bit (just for illustration).
> If I hash or encrypt XML data with a low entropy using such a transform, a
> nonce would help because the nonce would influence the complete
> transformation step. That sort of "nonce" is used if we hash a unix
> password in /etc/shadow (the 'salt' value is the nonce) or if we encrypt a
> message using RSA-OAEP (the random parameters are the nonce). In these
> transforms, each input bit has effect on the output bits and then the
> nonce
>
> does it's job).
>
> But - encrypting a message in CBC is _NOT_ a single transformation: It's
> splitting the message into small chunks (the blocks) and transforming each
> block separately (OK, not absolutely separate because we use the chaining
> to brevent reordering of blocks and that stuff). So a CBC is not a single
> transform but many small transforms.
>
> [TG] True.  I thought one major purpose of chaining was to prevent having
> the same plaintext block yield the same ciphertext each time.

Yes, right. Same plaintext blocks at different positions in the stream map 
to different ciphertexts, reordering and replay of ciphertext blocks is 
prevented, etc. For a general discussion about Block Ciphers and Modes Of 
Operation see Menezes/Oorschot/Vanstone [1], especially chapter 7 [2].

[1] http://www.cacr.math.uwaterloo.ca/hac/
[2] http://www.cacr.math.uwaterloo.ca/hac/about/chap7.pdf

> So if we use a Nonce which length is (blocklength - 1), we have the
> situation that the block containing the first plaintext octet (and the
> preceding blocklength-1 nonce octets) gets a higher entropy, but ONLY this
> block gets a higher entropy. All following blocks to not take advantage of
> this nonce. Additionally, the first plaintext octet is vulnerable to
> malicious modification.
>
> [TG] I don't think this follows.  In particular, if you modify plaintext
> and you're using both CBC and PKCS-5 padding, you'll probably break the
> padding (probability slightly less than 254/255) and cause decryption to
> fail.  This is true for any block encryption using both CBC and PKCS-5
> padding.  Also, for any block encryption using CBC you'll modify ALL
> subsequent plaintext.

Sorry, I'm not sure whether I understand you correctly. The attacker can 
modify the ciphertext and hopes that this modification will result in a 
'good' modification in the decrypted plaintext. He doesn't modify the 
plaintext prior encryption. Maybe I was a little bit unclear in my 
explanations. What do you mean by "for any block encryption using CBC 
you'll modify ALL subsequent plaintext"? CBC 'recovers' from errors in 
ciphertext blocks. In CBC mode, we don't have an infinite error 
propagation, but only one which changes two plaintext blocks in the 
following manner:

Given plaintext blocks P_0, P_1, P_2, ..., P_n. When these blocks are 
encrypted, the resulting ciphertext blocks are C_0, C_1, ..., C_n. 
Depending on the length of P_n (if P_n is a complete block), the PKCS-5 
padding adds a C_{n+1}. What happens if an attacker changes a single bit in 
ciphertext block C_m:

After decryption, with a probability of 50% the plaintext bits from P_m 
change (simply saying 50% of P_m is trashed) and the modified bit position 
from C_m changes the according bit in P_{m+1}. All subsequent plaintext 
blocks P_{m+2}, P_{m+3), ... decrypt without any errors. So in CBC not ALL 
subsequent plaintext after a modification will decrypt with errors, but 
only the modified block will be trashed completely (50% bit errors) and the 
following plaintext will have an exactly determined bit error.

So to come back to my favorite topic, encryption of the IV results in these 
properties:

A manipulated bit in _ANY_ ciphertext block (regardless whether it's in the 
1st block with the encrypted IV or on a 'real' ciphertext block containing 
encrypted plaintext or in the last block containing the padding) results in 
50% changed bits of the according plaintext block - at least one block is 
completely trashed!!! If you tamper the last ciphertext block (which 
includes the padding), unpadding will (probably) fail.

Regards,
Christian

Received on Thursday, 10 January 2002 04:38:37 UTC