Re: Encryption code-set issues (Was Re: WOOPS: xmlenc Call 13:00 EST 20020107) from Tom Gindin on 2002-01-15 (xml-encryption@w3.org from January 2002)

From: Tom Gindin <tgindin@us.ibm.com>
Date: Tue, 15 Jan 2002 17:53:20 -0500
To: xml-encryption@w3.org
Message-ID: <OF6E2088C8.2E7D4D44-ON85256B42.007CF7C7@pok.ibm.com>
      Responses below, set off by [TG].  I'm not an I18N guru, but I know a
little about Unicode and the UTF's.  Non-Latin alphabets include some
fairly common ones, and CJK is Chinese, Japanese, and Korean.
      This message was originally sent on Jan. 11, but was rejected since I
was then a subscriber to XMLDSIG but not XMLENC.  I would also like to
point out that while CBC is the most common block chaining mode, some of
the conditions which have been suggested as desirable for XML encryption
(such as corruption of ciphertext being detected) work better with CBCC or
BC modes.  The primary reason why I have seen CBC cited as more desirable
than those is its superior recovery from transmission errors, which is not
an important feature for transmission over IP or storage on disk media.

            Tom Gindin


Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de> on
01/11/2002 04:25:58 AM

To:    Tom Gindin/Watson/IBM@IBMUS
cc:    xml-encryption@w3.org
Subject:    Re: Encryption code-set issues (Was Re: WOOPS: xmlenc Call
       13:00 EST 20020107)


Hi Tom

--On Donnerstag, 10. Januar 2002 18:41 -0500 Tom Gindin
<tgindin@us.ibm.com> wrote:

(snip)

> Furthermore, if you're looking at UTF-8 data with an IV up front,
> the likelihood of failing to detect a single-block manipulation is 2**-8
> for ASCII with CBC and 8 byte blocks, 2**-36 for most non-Latin alphabets
> (Latin-1 and Latin-2 are slightly less sensitive than ASCII, but depend
on
> the language), and 2**-27 or so for CJK.  For CJK in UCS-2 it's 2**-8,
> just like ASCII.  For CBCC the same values multiply with each subsequent
> block, and get a final 1/255 from PKCS-5 padding.
>
>             Tom Gindin

Hm, what does that mean? Are these the probabilities that a decrypted
plaintext block with 50% bit errors does not result in a parsing error but
in valid UTF-8? I can't follow the Latin/CJK stuff because I'm not a I18n
guru, but for the PKCS5 padding, this depends on two factors:

[TG]  The probabilities cited are the probabilities that a randomly
manipulated 8-byte block consists of valid UTF-8 for the language
indicated.

1: What's the size of the last block: If the last block is a full block (so

that you have to pad a full block) which means that the padding block
contains no plaintext, you're right with your 2**-8 _IF_

2: You're padding properties are choosen well. If we look on how padding
according to our spec, there's written:

The padded plain text would then be 0x616263????????05 where the "??" bytes

can be any value

Well, we have different padding mechas with this property:

Plaintext           FF FF FF FF FF FF FF FF FF
padded Plaintext    FF FF FF FF FF FF FF FF FF ?? ?? ?? ?? ?? ?? 07

mechas to fill: ISO 10126d2 fills with random bytes, PKCS-7 with the value
from the length field and X.923 uses zeroes. If you use ISO10126d2, I agree

that the probability is that low because due to the random fill, there's no

integrity check. But if the  PKCS-7 or X.923 padding are used and the
decryptor checks that the pad-bytes are according to the used padding, the
probability is no longer 2**-8:

ISO10126d2 Padding  FF FF FF FF FF FF FF FF FF 8C 39 16 BD 69 DB 07
PKCS-7 Padding      FF FF FF FF FF FF FF FF FF 07 07 07 07 07 07 07
X.923 Padding       FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 07

[TG] I think that's PKCS-5 padding, not PKCS-7.  The dominant factor in
checking for legal PKCS-5 padding is that if the last byte is 01, it's
legal (assuming that you don't know the data length in advance) - it's also
legal if the last two bytes are 0202, etc.  The chances of this sum to
something only a little greater than 2**-8 with the exact value depending
on the block length.  X.923 looks like it has the same integrity
characteristics.

Christian
Received on Tuesday, 15 January 2002 18:09:17 UTC