Re: Action 43: to produce example for breakage due to current E01 language from Sean Mullan on 2007-06-04 (public-xmlsec-maintwg@w3.org from June 2007)

From: Sean Mullan <Sean.Mullan@Sun.COM>
Date: Mon, 04 Jun 2007 15:36:29 -0400
To: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
Cc: public-xmlsec-maintwg@w3.org
Message-id: <466469BD.7080601@sun.com>
Konrad Lanz wrote:
> Dear all,
> 
> TLR summarized the outcome of the discussion we had just after the last 
> conference call about strings in XML here:
> 
> http://lists.w3.org/Archives/Public/public-xmlsec-maintwg/2007May/0048.html
> 
> I already mentioned there after taking a quick look at the c14n spec 
> that canonicalization would heal differences with respect to "\<" and 
> "&"  in the string representation of a DName, and we agreed on that ...
> 
> So with respect to "\<" and "&" in a DName string representation, this 
> action is discharged ... and this is a non-issue also from my perspective.

So are we recommending that KeyInfo should always be signed if it 
contains DNames? I'm not sure I understand the outcome of this discussion.

> With respect to the current text there are still some other things 
> remaining when looking at the latest red line document:
> 

> First I'd like to mention that it can be argued if such corner cases 
> affected by those rules appear at all in real life scenarios and hence 
> may be irrelevant.
> Nonetheless I'd like to discuss it further as it should not be to hard 
> to reach a clear set of rules.
> 
> So let's first have a look at the current text and discuss it a little:
>>
>> Also, strings in DNames (|X509IssuerSerial|,|X509SubjectName|, and 
>> |KeyName| if approriate) should be encoded in accordance with RFC2253 
>> [LDAP-DN] except for the encoding of string values within a DName: 
>> %%E01 2002-01-28%%as follows:
>>
>>     * Consider the string as consisting of Unicode characters.
>>     * Escape occurrences of the following special characters by
>>       prefixing it with the "\" character:
>>           o a "#" character occurring at the beginning of the string
>>
> What happens to a leading space " " in an AttributeValue (AVA Value)?
> According to RFC 2253 this would have to be escaped by "\ ", but here 
> that is not mentioned.
> 
> I would assume that leading spaces have been forgotten to be mentioned 
> in the first sub point of the second bullet point.
> This position is also supported by the examples given in 
> http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002JanMar/0246.html .

I agree, it seems to have been a mistake.

>>    *
>>           o one of the characters ",", "+", """, "\", "<", ">" or ";"
>>     * Escape all occurrences of ASCII control characters (Unicode
>>       range \x00 - \x1f) by replacing them with "\" followed by a two
>>       digit hex number showing its Unicode number.
>>     * Escape any trailing white space by replacing "\ " with "\20".
>>
> Could anyone from the original working group shed some light on why the 
> last space should (note the small capitalization of the should in the 
> first sentence of the rules) be escaped using "\20" instead of "\ ".
> 
> I'd doubt that this is RFC2253 compliant. RFC 2253 explicitly mentions 
> that the last space in an Attribute value will have to be escaped using 
> "\ " and not "\20" as other escaping are only valid for characters other 
> than those mentioned in section 2.4 .
> 
> I also wonder what the rationale would be to treat a leading space 
> different from a trailing one ?

Anyone know how to find Gregor as he may be the only one who knows?

>>     * Since a XML document logically consists of characters, not
>>       octets, the resulting Unicode string is finally encoded
>>       according to the character encoding used for producing the
>>       physical representation of the XML document.
>>
> Last but not least I'd like to mention that 
> http://www.w3.org/Signature/2001/04/05-xmldsig-interop.html#DNAME refers 
> to the link above and even talks about "The following example set 
> contains test vectors for the OPTIONAL DNAME encoding" which clearly 
> indicates to me that the so called "DNAME encoding rules at the end of 
> section 4.4.4" are optional and non normative.

Good observation.

> Summarizing I get the impression that the processing in the so called 
> "DNAME encoding rules at the end of section 4.4.4" is contradicting 
> RFC2253 and non normative.
> 
> Nevertheless I can see some value in escaping control characters and 
> spaces to protect them from being modified inside XML. However I would 
> argue that the protection of spaces is sufficiently covered in RFC 2253 
> already and does not need any additional treatment within XMLDSig. 
> (Interestingly RFC 2253 only asks for escaping the first leading and the 
> last trailing space, and hence allows to mix escaped spaces with 
> non-escaped spaces, which may be considered ugly but is clearly out of 
> our scope to be decided)
> 
> The situation is different with control characters needing protection 
> that is not provided by RFC 2253.
> Line breaks in string representations for example are changed in XML 
> (cf. http://www.w3.org/TR/2006/REC-xml-20060816/#sec-line-ends or the 
> note in http://www.w3.org/TR/2006/REC-xml-20060816/#NT-S).
> 
> Concluding I think the required rules could be a expressed in a clearer 
> fashion and hence would suggest something like the following text 
> specifying how to accommodate DNames inside an XML Document:
> 
> A quick proposal to serve as a basis for further discussion:
>> DNames (X509IssuerSerial,X509SubjectName, and KeyName) MUST be 
>> represented in accordance with RFC2253 [LDAP-DN] with the difference 
>> that they will obviously have to have the same encoding (UTF-8, 
>> UTF-16, UTF-32, ISO-8859-1, etc ...) as the XML document and are not 
>> limited to UTF-8.
>> The AttributeValues within a DName are escaped according to the rules 
>> laid out in RFC2253 with the additional requirement that esacping of 
>> control characters MUST be performed as follows:
>>
>>     * Escape all occurrences of control characters (Unicode range x00
>>       - x1f) by replacing them with "\" followed by a two digit hex
>>       number showing its Unicode number.

Why a MUST? Why is it so important to escape control characters? It is 
probably obvious to others but I just want to understand the rationale 
more. It would be nice to be able to support the default RFC 2253 
algorithm without *requiring* any additional processing.

>>
>> Note: According to RFC2253 it is valid to also escape other 
>> characters, which is not changed by this additional requirement.
> 
> The other requirement from my point of view is not RFC 2253 compliant 
> and confusing and hence should be reviewed and then potentially be removed.
>>
>>     * Escape any trailing white space by replacing "\ " with "\20".
>>

--Sean
Received on Monday, 4 June 2007 19:37:29 UTC