Fwd: Re: schema validity in encryption

Dear Encryption specialists,

[please keep crossposting me. I'm not on the list]
I have some comments on encryption requirements.
Joseph Reagle asked me to discuss them here.


This is based on the following text which I got
from Joseph Reagle:

<<<<
          2. XML Instances {[70]WS}
                1. Encrypted instances must be well-formed but need not be
                   valid (i.e. applications that encrypt the element
                   structure are purposefully doing so.)
                2. Instance authors that want to validate encrypted
                   instances must:
                     1. Write the original schema so as to validate
                        resulting instances and the change in structure and
                        inclusion of element types from the XML Encryption
                        namespace.
                     2. Provide a post-encryption schema for validating
                        encrypted instances.
                     3. Only encrypt CDATA sections and place
                        DecryptionInfo and KeyInfo in an external document.
 >>>>

And this is from a discussion between me and Joseph:

At 11:26 12/16/2000 +0900, Martin J. Duerst wrote:

[about the 2.2 list]

>Please add 'do one of the following' to make clear this is an 'or'
>>list, not an 'and' list.
>
>Ok.
>
>>'CDATA sections' seems clearly wrong.
>>I guess you mean text content only.
>
>Oops, yes, better to put the P in PCDATA, I used the terminology from the 
>requirement proposal [1].

Yes, please put in the P, and remove 'sections', too.

[about 2.2.2, about why I clearly and strongly disagree with
this requirement]:

>>This would change
>>     <element>Clear text here.</element>
>>to
>>     <element>ScRaMbLeD TeXt HeRe</element>
>>yes? While this may work technically (it will validate), I have
>>serious problems with such an approach. The markup is now actually
>>completely wrong. What was an <element> is still called an <element>,
>>but it's not an <element> anymore, it's an <encodedElement>. The
>>original Markup has been misused. This can be seen as a problem
>>of markup philosophy (or whatever you call it) but can also lead
>>to very serious practical problems. If the document is received
>>as is, and by accident or whatever the separate information in
>>an external document is lost (very easy to happen), the encoded
>>information will be taken as the real information, with very
>>bad consequences.
>
>I'm not keen on this either, but it's been proposed [1] and consequently 
>represented in the requirements (at this stage, before we hone them down). 
>Feel free to take this up on the list.
>
>[1] http://www.w3.org/2000/11/02-xml-encryption-ws/wiley.html

I haven't found anything about schemas in [1], but probably I didn't
look at the right place.

Anyway, I'm proposing that this requirement be removed outright.


>>By the way, is there any requirement in your list that the encryption
>>should be done on characters (i.e. UCS), not bytes? This is important
>>to avoid transcoding problems.
>
>No, and my assumption was that it happens at the octet level. Also, feel 
>free to send this to the list too.

This won't really work. Here is why:

Assume we have a document encoded in Shift_JIS (frequent in Japan).
Now part of it is encrypted based on the byte values of Shift_JIS.
The document is then passed on, and at one point converted, let's
say to UTF-8. Nothing in the current infrastructure will remember
that this document was originally in Shift_JIS. It's tough enough
(but very important for automatic processing) to make sure that
we actually know what the current encoding is :-(. Next, the
encryption is resolved, and some byte values are restored. The
whole document should still be in UTF-8, but actually it isn't.
Any XML parser will be required to throw a fatal error.

Therefore, I propose to make sure, with an appropriate requirement,
that encryption happens on the character level (e.g. by defining
that it happens on the byte level in a well defined character encoding
such as UTF-8).

Please note that the result of the encryption also has to be
treated as characters, rather than bytes, but that's usually
done by using something like base64, which should work out fine.


Regards,    Martin.

Received on Wednesday, 20 December 2000 14:56:37 UTC