Action 43: to produce example for breakage due to current E01 language from Konrad Lanz on 2007-06-04 (public-xmlsec-maintwg@w3.org from June 2007)

From: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
Date: Mon, 04 Jun 2007 20:52:05 +0200
To: public-xmlsec-maintwg@w3.org
Message-ID: <46645F55.5080305@iaik.tugraz.at>
Dear all,

TLR summarized the outcome of the discussion we had just after the last 
conference call about strings in XML here:

http://lists.w3.org/Archives/Public/public-xmlsec-maintwg/2007May/0048.html

I already mentioned there after taking a quick look at the c14n spec 
that canonicalization would heal differences with respect to "\<" and 
"&"  in the string representation of a DName, and we agreed on that ...

So with respect to "\<" and "&" in a DName string representation, this 
action is discharged ... and this is a non-issue also from my perspective.

With respect to the current text there are still some other things 
remaining when looking at the latest red line document:

####

The text says:
  "At least one element, from the following ... "

So the bullet points will still have to enumerate the the choice of 
elements within the content of |X509Data| which is not the case in the 
current red line document ...

The text for the first two bullet points will have to read something 
like this:

       * The |X509IssuerSerial| element, which contains an X.509
           issuer distinguished name/serial number pair.  The distinguished
           name SHOULD be compliant with the DNAME
           encoding rules at the end of this section and the serial
           number is represented as a decimal integer,
        * The |X509SubjectName| element, which contains an X.509
           subject distinguished name that SHOULD be compliant with the
           DNAME encoding rules at the end of this section,


####

The so called "DNAME encoding rules at the end of section 4.4.4" are 
still not entirely clear to me.

First I'd like to mention that it can be argued if such corner cases 
affected by those rules appear at all in real life scenarios and hence 
may be irrelevant.
Nonetheless I'd like to discuss it further as it should not be to hard 
to reach a clear set of rules.

So let's first have a look at the current text and discuss it a little:
>
> Also, strings in DNames (|X509IssuerSerial|,|X509SubjectName|, and 
> |KeyName| if approriate) should be encoded in accordance with RFC2253 
> [LDAP-DN] except for the encoding of string values within a DName: 
> %%E01 2002-01-28%%as follows:
>
>     * Consider the string as consisting of Unicode characters.
>     * Escape occurrences of the following special characters by
>       prefixing it with the "\" character:
>           o a "#" character occurring at the beginning of the string
>
What happens to a leading space " " in an AttributeValue (AVA Value)?
According to RFC 2253 this would have to be escaped by "\ ", but here 
that is not mentioned.

I would assume that leading spaces have been forgotten to be mentioned 
in the first sub point of the second bullet point.
This position is also supported by the examples given in 
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002JanMar/0246.html .

>    *
>           o one of the characters ",", "+", """, "\", "<", ">" or ";"
>     * Escape all occurrences of ASCII control characters (Unicode
>       range \x00 - \x1f) by replacing them with "\" followed by a two
>       digit hex number showing its Unicode number.
>     * Escape any trailing white space by replacing "\ " with "\20".
>
Could anyone from the original working group shed some light on why the 
last space should (note the small capitalization of the should in the 
first sentence of the rules) be escaped using "\20" instead of "\ ".

I'd doubt that this is RFC2253 compliant. RFC 2253 explicitly mentions 
that the last space in an Attribute value will have to be escaped using 
"\ " and not "\20" as other escaping are only valid for characters other 
than those mentioned in section 2.4 .

I also wonder what the rationale would be to treat a leading space 
different from a trailing one ?
>
>     * Since a XML document logically consists of characters, not
>       octets, the resulting Unicode string is finally encoded
>       according to the character encoding used for producing the
>       physical representation of the XML document.
>
Last but not least I'd like to mention that 
http://www.w3.org/Signature/2001/04/05-xmldsig-interop.html#DNAME refers 
to the link above and even talks about "The following example set 
contains test vectors for the OPTIONAL DNAME encoding" which clearly 
indicates to me that the so called "DNAME encoding rules at the end of 
section 4.4.4" are optional and non normative.

Summarizing I get the impression that the processing in the so called 
"DNAME encoding rules at the end of section 4.4.4" is contradicting 
RFC2253 and non normative.

Nevertheless I can see some value in escaping control characters and 
spaces to protect them from being modified inside XML. However I would 
argue that the protection of spaces is sufficiently covered in RFC 2253 
already and does not need any additional treatment within XMLDSig. 
(Interestingly RFC 2253 only asks for escaping the first leading and the 
last trailing space, and hence allows to mix escaped spaces with 
non-escaped spaces, which may be considered ugly but is clearly out of 
our scope to be decided)

The situation is different with control characters needing protection 
that is not provided by RFC 2253.
Line breaks in string representations for example are changed in XML 
(cf. http://www.w3.org/TR/2006/REC-xml-20060816/#sec-line-ends or the 
note in http://www.w3.org/TR/2006/REC-xml-20060816/#NT-S).

Concluding I think the required rules could be a expressed in a clearer 
fashion and hence would suggest something like the following text 
specifying how to accommodate DNames inside an XML Document:

A quick proposal to serve as a basis for further discussion:
> DNames (X509IssuerSerial,X509SubjectName, and KeyName) MUST be 
> represented in accordance with RFC2253 [LDAP-DN] with the difference 
> that they will obviously have to have the same encoding (UTF-8, 
> UTF-16, UTF-32, ISO-8859-1, etc ...) as the XML document and are not 
> limited to UTF-8.
> The AttributeValues within a DName are escaped according to the rules 
> laid out in RFC2253 with the additional requirement that esacping of 
> control characters MUST be performed as follows:
>
>     * Escape all occurrences of control characters (Unicode range x00
>       - x1f) by replacing them with "\" followed by a two digit hex
>       number showing its Unicode number.
>
> Note: According to RFC2253 it is valid to also escape other 
> characters, which is not changed by this additional requirement.

The other requirement from my point of view is not RFC 2253 compliant 
and confusing and hence should be reviewed and then potentially be removed.
>
>     * Escape any trailing white space by replacing "\ " with "\20".
>

regards

Konrad

-- 
Konrad Lanz, IAIK/SIC - Graz University of Technology
Inffeldgasse 16a, 8010 Graz, Austria
Tel: +43 316 873 5547
Fax: +43 316 873 5520
https://www.iaik.tugraz.at/aboutus/people/lanz
http://jce.iaik.tugraz.at

Certificate chain (including the EuroPKI root certificate):
https://europki.iaik.at/ca/europki-at/cert_download.htm
Received on Monday, 4 June 2007 18:52:20 UTC