Last call comments on XML Encryption specs from by way of Joseph Reagle on 2001-12-03 (xml-encryption@w3.org from December 2001)

From: by way of Joseph Reagle <duerst@w3.org>
Date: Mon, 3 Dec 2001 10:03:20 -0500
To: xml-encryption@w3.org
Message-Id: <20011203150321.AC2EE546@policy.w3.org>
Dear XML Encryption Editors and Working Group,

I'm sending you some (late, sorry) last call comments on
your documents. These are internationalization comments,
but I'm currently sending them in as an individual. I
expect the I18N WG will look at them at their upcomming
teleconf on Tuesday, which might result in some tweaks,
but probably not big changes.

I'm adding some non-i18n points at the end; these are
personal only and I don't expect the I18N WG to discuss
them.

The syntax/processing is basically right (in the sense
that XML is serialized using UTF-8). However, there is no
corresponding requirement for it, and there is none of
the details, 'health warnings' and security warnings that
we worked out for the XML Signature spec and that I would
have expected to be reused.

In detail:

For the Requirements doc at:
http://www.w3.org/TR/2001/WD-xml-encryption-req-20011018

- There should be a requirement that says that encryption
   should work (in the sense that you get the original
   stuff back after decription) under Infoset-preserving
   transformations of the XML that contains the encrypted
   pieces. [This makes sure that when encrypting XML, it
   has to be in a defined encoding (as it currently is).]

For the Syntax/Processing doc at:
http://www.w3.org/TR/2001/WD-xml-encryption-req-20011018

- There needs to be a requirement to use NFC when converting
   from a legacy encoding to UTF-8 when encrypting. This should
   be very much the same as in XML Signature, Section 6.5
   (http://www.w3.org/TR/xmldsig-core/#sec-c14nAlg), last two
   paragraphs. There should also be something like the last
   paragraph before section 7.1
   (http://www.w3.org/TR/xmldsig-core/#sec-XML-Canonicalization).

- In section 4.2 Decryption, in step 4.3, the wording 'replace' ...
   'by the UTF-8 encoded characters' may easily be misunderstood.
   After decryption, there will be a byte stream with characters
   encoded in UTF-8, but the replacement operation has to make
   sure that the appropriate character encoding conversion
   (transcoding) is applied. As an example, if the decrypted
   element or element content is inserted into a DOM, there
   has to be a conversion from UTF-8 to UTF-16. This should be
   made clear.

- There needs to be some text about security risks associated
   with UTF-8. Assume that somebody knows that the encrypted
   text is Old Italic (http://www.unicode.org/charts/PDF/U10300.pdf,
   no spaces or punctuation). In this case, UTF-8 uses four bytes per
   characters, and three of them are always the same, and the top
   two (or three if there are no numbers) bits of the last byte
   are also always the same. There is probably some chance that
   this will make it easier to break the encryption. This should
   clearly be mentioned in the Security section, with maybe
   an advice that compression can help (but then it has to be
   possible to apply compression before encryption, might be good
   to have an example of this). I'm not an expert to assess the
   exact extent of this risk, so please use your expertize in this
   field. There are other, somewhat less extreme examples than old
   Italic, but the the point is the same.

- URI -> anyURI/IRI: According to the Character Model,
   http://www.w3.org/TR/charmod/#sec-URIs, you have to make sure
   that wherever you use URIs, non-ASCII characters are allowed,
   and that conversion to ASCII only is done as late as possible.
   You already have this right in the Schema, by using anyURI,
   but you should make it clear in the text.

- In 2.2.1, 'media type URI' is mentioned, but there is neither
   an explanation nor a reference. In addition, it would be good
   to check/explain that this can include parameters (such as
   charset).

========== non-i18n points from here down ===================

Major point:
- 2.1.5 forbids the encription of only part of EncryptedData
   or EncryptedKey. I don't see any particular reason for
   forbidding this, except to make some XML Schema issues
   easier. But I think it would be extremely valuable for the
   WG and the spec to do this exercise and to show how the
   Schema has to be changed to allow this. This is important
   because allowing encryption in places where it's not
   provided in some existing schema is something that applications
   using the spec will have to do a lot, and it's a good thing
   to work out (some of the) details.
   Even if this is not changed for EncryptedData or EncryptedKey,
   there should be an extended discussion of how to change a
   schema to work with encryption.

Small points:
- 'Bank of the Internet' should be changed to 'Example Bank'
- In 2.1.4, change 'octet set' to 'octet sequence'.
- I think using 'www.isi.edu' is rather outdated for iana uris.
- Citing the obsolete RFC 1738 will confuse many people.
- Reference XML-MT is obsoleted by RFC 3023.
- In the 'Schema definition' at the start of 3., there are
   entities p and s defined, but they never get used.
   There is also a spurious &xenc; in 2.2.2.
- In the first paragraph of 4.3, "that octets' semantics"
   isn't very clear. There seems to be a reference, but it's
   not clear to what. Octets as such don't really have any
   semanics anyway.
- 5.2.1 and others: please change the space after "<EncryptionMethod"
   to a line break to increase the chance that the identifier is
   complete in printouts.


Regards,    Martin.
Received on Monday, 3 December 2001 10:03:22 UTC