Re: DName encoding (was:KeyName white space) from Donald E. Eastlake 3rd on 2001-06-12 (w3c-ietf-xmldsig@w3.org from April to June 2001)

From: Donald E. Eastlake 3rd <dee3@torque.pothole.com>
Date: Tue, 12 Jun 2001 15:27:55 -0400
To: w3c-ietf-xmldsig@w3.org
Message-Id: <200106121927.PAA0000052661@torque.pothole.com>
Since no one has responded to my message below, I assume that we have
adopted Gregor's proposal and that, except as may be provided in RFC
2253, whitespace is significant.

Donald

From:  "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Message-Id:  <200105310255.WAA0000031182@torque.pothole.com>
To:  w3c-ietf-xmldsig@w3.org
Date:  Wed, 30 May 2001 22:55:30 -0400

>It seems to me that we need to come to some sort of conclusion on
>this... Maybe people think we did and I missed it, in which case I
>hope they will set me straight.
>
>The following proposal by Gregor for encoding STRINGS in DNames does
>not seem to have been refuted, but do we need to say more about
>leading and trailing whitespace on the DName as a whole?
>
>Thanks,
>Donald
> 
>From:  "Gregor Karlinger" <gregor.karlinger@iaik.at>
>To:  "Tom Gindin" <tgindin@us.ibm.com>, "Merlin Hughs" <merlin@baltimore.ie>,
>            <duerst@w3.org>
>Cc:  <w3c-ietf-xmldsig@w3.org>
>Date:  Wed, 16 May 2001 15:13:14 +0200
>Message-ID:  <LBEPJAONIMDADHFHAEAOMELJCFAA.gregor.karlinger@iaik.at>
>In-reply-to:  <OF8536EC68.279166CB-ON85256A4D.005E2175@somers.hqregion.ibm.com>
>Subject:  DName encoding (was:KeyName white space)
>
>All,
>
>I have cited the most important statements (from my point of view) 
>regarding DName encoding below. I would like to make the following
>proposal for a guidline that should appear in XML-Signature on how
>to encode a strings in a DName in XML-Signature relevant structures 
>(IssuerName, SubjectName):
>
>o Consider the string as consisting of unicode characters.
>
>o Escape occurences of the following special characters by prefixing
>  it with the "\" character:
>
>    - a "#" character occurring at the beginning of the string
>    - one of the characters ",", "+", """, "\", "<", ">" or ";"
>
>o Escape all occurences of ASCII control characters (Unicode range
>  \x00 - \x20) by replacing them with "\x" folloed by a two digit
>  hex number showing its Unicode number.
>
>Since a XML document logically consists of characters, not bytes
>(Thanks Martin for reminding me ;-) the resulting unicode string is
>finally encoded according to the character encoding used for producing
>the physical representation of the XML document (But this is not in
>scope of the guidline that should be taken into the XML-Signature 
>specification ...)
>
>Liebe Gruesse/Regards, 
>---------------------------------------------------------------
>DI Gregor Karlinger
>mailto:gregor.karlinger@iaik.at
>http://www.iaik.at
>Phone +43 316 873 5541
>Institute for Applied Information Processing and Communications
>Austria
>---------------------------------------------------------------
> 
>
>### Tom Gindin ###
>
>[...]
>>      The situation is actually even more confusing than that.  The rules
>> seem to be, for HT, LF, FF and VT, something like the following:
>[...]
>
>### Merlin Hughs ###
>
>[...]
>> The escaping is useful because:
>> 
>>            <DName>CN=foo
>>            </DName>
>> 
>> According to RFC 2253, this states that the common name is
>> foo<LF><TAB>. However, if you trim() the text value then you
>> would get common name foo. Requiring escaping of ASCII controls
>> would result in CN=foo\0A\09 (if that is what is meant) which
>> is unambiguous and can be safely indented, whitespace formatted
>> and trim()ed. It would also eliminate a few other potential
>> ambiguities:
>[...]
>> is significant. If we require that all significant ASCII
>> controls be escaped, then trimming and formatting involving
>> newlines and tabs will be safe, and meaningful whitespace in
>> dnames will be explicit.
>
>### Martin Duerst ###
>
>An XML document consists of characters, not bytes. And in
>content, each character can be escaped by a numeric character
>reference. Therefore, you can have a document only encoded
>in us-ascii, but will lots of &#dddd; (or &#xhhhh;); this
>will transport any character in Unicode, and a browser
>will display it correctly (assuming it has the right font).
>
>> > There's another issue that seems relevant. RFC 2253 states
>> > that strings must be converted to UTF-8 and then the escaping
>> > rules must be applied. Do we honour this, or should we UTF-8
>> > decode the RFC2253 string before embedding it in the text node.
>> >
>> > Essentially, should the final example in RFC 2253 be encoded
>> > in XML as:
>> >
>> > UTF-8 encode and require ASCII escaping of high-bit-set chars:
>> >   SN=Lu\C4\8Di\C4\87
>
>This would be okay.
>
>[...]
>> > De-UTF-8 and embed the Unicode original:
>> >   SN=Lu?i? (where ? is the original character)
>> >
>> > The last seems like the best option to me.
>
>I agree. But it has to be spelled out clearly, and tested.
>
>############
>
Received on Tuesday, 12 June 2001 15:28:48 UTC