- From: Tom Gindin <tgindin@us.ibm.com>
- Date: Tue, 15 Jan 2002 17:53:20 -0500
- To: xml-encryption@w3.org
Responses below, set off by [TG]. I'm not an I18N guru, but I know a little about Unicode and the UTF's. Non-Latin alphabets include some fairly common ones, and CJK is Chinese, Japanese, and Korean. This message was originally sent on Jan. 11, but was rejected since I was then a subscriber to XMLDSIG but not XMLENC. I would also like to point out that while CBC is the most common block chaining mode, some of the conditions which have been suggested as desirable for XML encryption (such as corruption of ciphertext being detected) work better with CBCC or BC modes. The primary reason why I have seen CBC cited as more desirable than those is its superior recovery from transmission errors, which is not an important feature for transmission over IP or storage on disk media. Tom Gindin Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de> on 01/11/2002 04:25:58 AM To: Tom Gindin/Watson/IBM@IBMUS cc: xml-encryption@w3.org Subject: Re: Encryption code-set issues (Was Re: WOOPS: xmlenc Call 13:00 EST 20020107) Hi Tom --On Donnerstag, 10. Januar 2002 18:41 -0500 Tom Gindin <tgindin@us.ibm.com> wrote: (snip) > Furthermore, if you're looking at UTF-8 data with an IV up front, > the likelihood of failing to detect a single-block manipulation is 2**-8 > for ASCII with CBC and 8 byte blocks, 2**-36 for most non-Latin alphabets > (Latin-1 and Latin-2 are slightly less sensitive than ASCII, but depend on > the language), and 2**-27 or so for CJK. For CJK in UCS-2 it's 2**-8, > just like ASCII. For CBCC the same values multiply with each subsequent > block, and get a final 1/255 from PKCS-5 padding. > > Tom Gindin Hm, what does that mean? Are these the probabilities that a decrypted plaintext block with 50% bit errors does not result in a parsing error but in valid UTF-8? I can't follow the Latin/CJK stuff because I'm not a I18n guru, but for the PKCS5 padding, this depends on two factors: [TG] The probabilities cited are the probabilities that a randomly manipulated 8-byte block consists of valid UTF-8 for the language indicated. 1: What's the size of the last block: If the last block is a full block (so that you have to pad a full block) which means that the padding block contains no plaintext, you're right with your 2**-8 _IF_ 2: You're padding properties are choosen well. If we look on how padding according to our spec, there's written: The padded plain text would then be 0x616263????????05 where the "??" bytes can be any value Well, we have different padding mechas with this property: Plaintext FF FF FF FF FF FF FF FF FF padded Plaintext FF FF FF FF FF FF FF FF FF ?? ?? ?? ?? ?? ?? 07 mechas to fill: ISO 10126d2 fills with random bytes, PKCS-7 with the value from the length field and X.923 uses zeroes. If you use ISO10126d2, I agree that the probability is that low because due to the random fill, there's no integrity check. But if the PKCS-7 or X.923 padding are used and the decryptor checks that the pad-bytes are according to the used padding, the probability is no longer 2**-8: ISO10126d2 Padding FF FF FF FF FF FF FF FF FF 8C 39 16 BD 69 DB 07 PKCS-7 Padding FF FF FF FF FF FF FF FF FF 07 07 07 07 07 07 07 X.923 Padding FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 07 [TG] I think that's PKCS-5 padding, not PKCS-7. The dominant factor in checking for legal PKCS-5 padding is that if the last byte is 01, it's legal (assuming that you don't know the data length in advance) - it's also legal if the last two bytes are 0202, etc. The chances of this sum to something only a little greater than 2**-8 with the exact value depending on the block length. X.923 looks like it has the same integrity characteristics. Christian
Received on Tuesday, 15 January 2002 18:09:17 UTC