- From: Martin J. Duerst <duerst@w3.org>
- Date: Wed, 20 Dec 2000 04:39:54 -0500 (EST)
- To: xml-encryption@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear Encryption specialists, [please keep crossposting me. I'm not on the list] I have some comments on encryption requirements. Joseph Reagle asked me to discuss them here. This is based on the following text which I got from Joseph Reagle: <<<< 2. XML Instances {[70]WS} 1. Encrypted instances must be well-formed but need not be valid (i.e. applications that encrypt the element structure are purposefully doing so.) 2. Instance authors that want to validate encrypted instances must: 1. Write the original schema so as to validate resulting instances and the change in structure and inclusion of element types from the XML Encryption namespace. 2. Provide a post-encryption schema for validating encrypted instances. 3. Only encrypt CDATA sections and place DecryptionInfo and KeyInfo in an external document. >>>> And this is from a discussion between me and Joseph: At 11:26 12/16/2000 +0900, Martin J. Duerst wrote: [about the 2.2 list] >Please add 'do one of the following' to make clear this is an 'or' >>list, not an 'and' list. > >Ok. > >>'CDATA sections' seems clearly wrong. >>I guess you mean text content only. > >Oops, yes, better to put the P in PCDATA, I used the terminology from the >requirement proposal [1]. Yes, please put in the P, and remove 'sections', too. [about 2.2.2, about why I clearly and strongly disagree with this requirement]: >>This would change >> <element>Clear text here.</element> >>to >> <element>ScRaMbLeD TeXt HeRe</element> >>yes? While this may work technically (it will validate), I have >>serious problems with such an approach. The markup is now actually >>completely wrong. What was an <element> is still called an <element>, >>but it's not an <element> anymore, it's an <encodedElement>. The >>original Markup has been misused. This can be seen as a problem >>of markup philosophy (or whatever you call it) but can also lead >>to very serious practical problems. If the document is received >>as is, and by accident or whatever the separate information in >>an external document is lost (very easy to happen), the encoded >>information will be taken as the real information, with very >>bad consequences. > >I'm not keen on this either, but it's been proposed [1] and consequently >represented in the requirements (at this stage, before we hone them down). >Feel free to take this up on the list. > >[1] http://www.w3.org/2000/11/02-xml-encryption-ws/wiley.html I haven't found anything about schemas in [1], but probably I didn't look at the right place. Anyway, I'm proposing that this requirement be removed outright. >>By the way, is there any requirement in your list that the encryption >>should be done on characters (i.e. UCS), not bytes? This is important >>to avoid transcoding problems. > >No, and my assumption was that it happens at the octet level. Also, feel >free to send this to the list too. This won't really work. Here is why: Assume we have a document encoded in Shift_JIS (frequent in Japan). Now part of it is encrypted based on the byte values of Shift_JIS. The document is then passed on, and at one point converted, let's say to UTF-8. Nothing in the current infrastructure will remember that this document was originally in Shift_JIS. It's tough enough (but very important for automatic processing) to make sure that we actually know what the current encoding is :-(. Next, the encryption is resolved, and some byte values are restored. The whole document should still be in UTF-8, but actually it isn't. Any XML parser will be required to throw a fatal error. Therefore, I propose to make sure, with an appropriate requirement, that encryption happens on the character level (e.g. by defining that it happens on the byte level in a well defined character encoding such as UTF-8). Please note that the result of the encryption also has to be treated as characters, rather than bytes, but that's usually done by using something like base64, which should work out fine. Regards, Martin.
Received on Wednesday, 20 December 2000 14:56:37 UTC