Re: Last call comments on XML Encryption specs from Joseph Reagle on 2001-12-17 (xml-encryption@w3.org from December 2001)

From: Joseph Reagle <reagle@w3.org>
Date: Mon, 17 Dec 2001 15:33:04 -0500
To: Martin Duerst <duerst@w3.org>, dee3@torque.pothole.com, "Takeshi Imamura" <IMAMU@jp.ibm.com>
Cc: w3c-i18n-ig@w3.org, xml-encryption@w3.org
Message-Id: <20011217203304.23ED98EB@policy.w3.org>
On Saturday 01 December 2001 21:02, Martin Duerst wrote:
> The syntax/processing is basically right (in the sense
> that XML is serialized using UTF-8). However, there is no
> corresponding requirement for it, and there is none of
> the details, 'health warnings' and security warnings that
> we worked out for the XML Signature spec and that I would
> have expected to be reused.

Hi Martin, while I learned a lot from XML Signature I fear I haven't 
completely internalized all the i18n-goodness yet, so I appreciate 
continued patience!

> For the Requirements doc at:
> http://www.w3.org/TR/2001/WD-xml-encryption-req-20011018
>
> - There should be a requirement that says that encryption
>    should work (in the sense that you get the original
>    stuff back after decription) under Infoset-preserving
>    transformations of the XML that contains the encrypted
>    pieces. [This makes sure that when encrypting XML, it
>    has to be in a defined encoding (as it currently is).]

I'm not quite sure what you mean. I prefer not to mention Infoset since 
we're still working with XML1.0 and XPath, and I'm not sure if we will 
issue another version of the requirement, but I'd like to make sure this 
concern (once understood) is addressed by the spec. In the spec we say on 
encryption, "obtain the octets by serializing the data in UTF-8 as 
specified in [XML]" and on decryption if the data was an XML element or 
content its UTF8, if not its just octets. BTW: Did you have a chance to 
look at:
  http://www.w3.org/Encryption/2001/Drafts/xmlenc-decrypt.html
There's a lot of XML/character processing involved there.

> - There needs to be a requirement to use NFC when converting
>    from a legacy encoding to UTF-8 when encrypting. This should
>    be very much the same as in XML Signature, Section 6.5
>    (http://www.w3.org/TR/xmldsig-core/#sec-c14nAlg), last two
>    paragraphs. There should also be something like the last
>    paragraph before section 7.1
>    (http://www.w3.org/TR/xmldsig-core/#sec-XML-Canonicalization).

I've added to the last sentence in section 3.1:

http://www.w3.org/Encryption/2001/Drafts/xmlenc-core/Overview.html#sec-EncryptedType
EncryptedType is the abstract type from which EncryptedData and 
EncryptedKey are derived. While these two latter element types are very 
similar with respect to their content models, a syntactical distinction is 
useful to processing. Implementation MUST generate laxly schema valid 
[XML-schema] EncryptedData or EncryptedKey as specified by the subsequent 
schema declarations /+and SHOULD create XML content (EncryptedTypeelements 
and their descendents/content) in Normalization Form C [NFC, 
NFC-Corrigendum].+/

> - In section 4.2 Decryption, in step 4.3, the wording 'replace' ...
>    'by the UTF-8 encoded characters' may easily be misunderstood.
>    After decryption, there will be a byte stream with characters
>    encoded in UTF-8, but the replacement operation has to make
>    sure that the appropriate character encoding conversion
>    (transcoding) is applied. As an example, if the decrypted
>    element or element content is inserted into a DOM, there
>    has to be a conversion from UTF-8 to UTF-16. This should be
>    made clear.

I agree. Takeshi (and others), do we want to say, "If the document into 
which the replacement is occurring is not UTF-8, the decryptor MUST 
transcode the UTF-8 encoded characters into the target encoding." ?

> - There needs to be some text about security risks associated
>    with UTF-8. Assume that somebody knows that the encrypted
>    text is Old Italic (http://www.unicode.org/charts/PDF/U10300.pdf,
>    no spaces or punctuation). In this case, UTF-8 uses four bytes per
>    characters, and three of them are always the same, and the top
>    two (or three if there are no numbers) bits of the last byte
>    are also always the same. 

This is addressed by the Nonce. Don, I've moved the most of the Nonce and 
IV discussion to a new section 6.3, and give some examples (Including 
unicode) from 3.3 . What should we call that section?

> - URI -> anyURI/IRI: According to the Character Model,
>    http://www.w3.org/TR/charmod/#sec-URIs, you have to make sure
>    that wherever you use URIs, non-ASCII characters are allowed,
>    and that conversion to ASCII only is done as late as possible.
>    You already have this right in the Schema, by using anyURI,
>    but you should make it clear in the text.

I'm not sure where we would do that. In xmldsig, we had a whole URI section 
of the spec:
  http://www.w3.org/TR/2001/PR-xmldsig-core-20010820/#sec-URI
so it make sense to specify this is completely as possible. However, xenc 
makes mores casual use of URIs with a few "this is like xmldsig Reference 
processing."  So I can't find a place where those four paragraphs on 
RFC2396+RFC2732+encoding_of_disallowed-characters, etc. As you say, we're 
using anyURI, so do we still need this text in every spec that uses a URI?


> - In 2.2.1, 'media type URI' is mentioned, but there is neither
>    an explanation nor a reference. In addition, it would be good
>    to check/explain that this can include parameters (such as
>    charset).

Ok, I'll say, now, "Other alternatives include 'content' of an element, or 
an external octet sequence that is identified by a media type URI [IANA], 
such as the example in Encrypting Arbitrary Data and XML Documents 
(section 2.1.4)." Presently, we have no provision for a charset. (The IANA 
directory does not provide URIs for these distinctions.) In xmldsig we have 
both a Type (to describe a higher level aspect, likes its a particular XML 
structure) and MIMEType/Encoding which I've tried to avoid here. However, 
if people feel we should have both here as well we could revert to the 
xmldsig approach or have some other way of describing the encoding -- are 
there URIs for the types of encoding?

-- 

Joseph Reagle Jr.                 http://www.w3.org/People/Reagle/
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/Signature/
W3C XML Encryption Chair          http://www.w3.org/Encryption/2001/
Received on Monday, 17 December 2001 15:33:16 UTC