Re: DName encoding (was:KeyName white space) from Tom Gindin on 2001-05-16 (w3c-ietf-xmldsig@w3.org from April to June 2001)

From: Tom Gindin <tgindin@us.ibm.com>
Date: Wed, 16 May 2001 11:59:51 -0400
To: merlin <merlin@baltimore.ie>
Cc: "Gregor Karlinger" <gregor.karlinger@iaik.at>, duerst@w3.org, w3c-ietf-xmldsig@w3.org
Message-ID: <OF7B96B3F4.FB108B76-ON85256A4E.00579C4E@somers.hqregion.ibm.com>
     These two proposals aren't as far apart as they look, since most of
the extra complexity in Gregor's comes from his copying the list of
printable characters to be escaped from RFC 2253.  An RFC-2253 encoded name
is supposed to have that escaping done and be in UTF-8.

          Tom Gindin


merlin <merlin@baltimore.ie>@baltimore.ie on 05/16/2001 09:41:35 AM

Sent by:  merlin@baltimore.ie


To:   "Gregor Karlinger" <gregor.karlinger@iaik.at>
cc:   "Tom Gindin" <tgindin@us.ibm.com>, duerst@w3.org,
      w3c-ietf-xmldsig@w3.org
Subject:  Re: DName encoding (was:KeyName white space)



Hi Gregor,

I would actually suggest something much simpler:

Encoding:
  Take the RFC 2253-encoded name, replace all characters < 32
  with \XX (RFC 2253 hex encoding) and then UTF-8 decode this.
  Whitespace format if desired.

Decoding:
  Trim whitespace, UTF-8 encode the text and you have the RFC
  2253-encoded name.

The value is RFC 2253 compliant (considering UTF-8) and is
easy to process without know anything about RFC 2253 or
DNames.

For example, to fire the dname at an LDAP directory from an
XML node you could just do:

out.write (node.getNodeValue ().trim ().getBytes ("UTF-8"));

Merlin

r/gregor.karlinger@iaik.at/2001.05.16/15:13:14
>All,
>
>I have cited the most important statements (from my point of view)
>regarding DName encoding below. I would like to make the following
>proposal for a guidline that should appear in XML-Signature on how
>to encode a strings in a DName in XML-Signature relevant structures
>(IssuerName, SubjectName):
>
>o Consider the string as consisting of unicode characters.
>
>o Escape occurences of the following special characters by prefixing
>  it with the "\" character:
>
>    - a "#" character occurring at the beginning of the string
>    - one of the characters ",", "+", """, "\", "<", ">" or ";"
>
>o Escape all occurences of ASCII control characters (Unicode range
>  \x00 - \x20) by replacing them with "\x" folloed by a two digit
>  hex number showing its Unicode number.
>
>Since a XML document logically consists of characters, not bytes
>(Thanks Martin for reminding me ;-) the resulting unicode string is
>finally encoded according to the character encoding used for producing
>the physical representation of the XML document (But this is not in
>scope of the guidline that should be taken into the XML-Signature
>specification ...)
>
>Liebe Gruesse/Regards,
>---------------------------------------------------------------
>DI Gregor Karlinger
>mailto:gregor.karlinger@iaik.at
>http://www.iaik.at
>Phone +43 316 873 5541
>Institute for Applied Information Processing and Communications
>Austria
>---------------------------------------------------------------
>
>
>### Tom Gindin ###
>
>[...]
>>      The situation is actually even more confusing than that.  The rules
>> seem to be, for HT, LF, FF and VT, something like the following:
>[...]
>
>### Merlin Hughs ###
>
>[...]
>> The escaping is useful because:
>>
>>            <DName>CN=foo
>>            </DName>
>>
>> According to RFC 2253, this states that the common name is
>> foo<LF><TAB>. However, if you trim() the text value then you
>> would get common name foo. Requiring escaping of ASCII controls
>> would result in CN=foo\0A\09 (if that is what is meant) which
>> is unambiguous and can be safely indented, whitespace formatted
>> and trim()ed. It would also eliminate a few other potential
>> ambiguities:
>[...]
>> is significant. If we require that all significant ASCII
>> controls be escaped, then trimming and formatting involving
>> newlines and tabs will be safe, and meaningful whitespace in
>> dnames will be explicit.
>
>### Martin Duerst ###
>
>An XML document consists of characters, not bytes. And in
>content, each character can be escaped by a numeric character
>reference. Therefore, you can have a document only encoded
>in us-ascii, but will lots of &#dddd; (or &#xhhhh;); this
>will transport any character in Unicode, and a browser
>will display it correctly (assuming it has the right font).
>
>> > There's another issue that seems relevant. RFC 2253 states
>> > that strings must be converted to UTF-8 and then the escaping
>> > rules must be applied. Do we honour this, or should we UTF-8
>> > decode the RFC2253 string before embedding it in the text node.
>> >
>> > Essentially, should the final example in RFC 2253 be encoded
>> > in XML as:
>> >
>> > UTF-8 encode and require ASCII escaping of high-bit-set chars:
>> >   SN=Lu\C4\8Di\C4\87
>
>This would be okay.
>
>[...]
>> > De-UTF-8 and embed the Unicode original:
>> >   SN=Lu?i? (where ? is the original character)
>> >
>> > The last seems like the best option to me.
>
>I agree. But it has to be spelled out clearly, and tested.
>
>############
>


-----------------------------------------------------------------------------

Baltimore Technologies plc will not be liable for direct,  special,
indirect
or consequential  damages  arising  from  alteration of  the contents of
this
message by a third party or as a result of any virus being passed on.

In addition, certain Marketing collateral may be added from time to time to
promote Baltimore Technologies products, services, Global e-Security or
appearance at trade shows and conferences.

This footnote confirms that this email message has been swept by
Baltimore MIMEsweeper for Content Security threats, including
computer viruses.
   http://www.baltimore.com
Received on Wednesday, 16 May 2001 12:00:52 UTC