DName encoding (was:KeyName white space) from Gregor Karlinger on 2001-05-16 (w3c-ietf-xmldsig@w3.org from April to June 2001)

From: Gregor Karlinger <gregor.karlinger@iaik.at>
Date: Wed, 16 May 2001 15:13:14 +0200
To: "Tom Gindin" <tgindin@us.ibm.com>, "Merlin Hughs" <merlin@baltimore.ie>, <duerst@w3.org>
Cc: <w3c-ietf-xmldsig@w3.org>
Message-ID: <LBEPJAONIMDADHFHAEAOMELJCFAA.gregor.karlinger@iaik.at>

All,

I have cited the most important statements (from my point of view) 
regarding DName encoding below. I would like to make the following
proposal for a guidline that should appear in XML-Signature on how
to encode a strings in a DName in XML-Signature relevant structures 
(IssuerName, SubjectName):

o Consider the string as consisting of unicode characters.

o Escape occurences of the following special characters by prefixing
  it with the "\" character:

    - a "#" character occurring at the beginning of the string
    - one of the characters ",", "+", """, "\", "<", ">" or ";"

o Escape all occurences of ASCII control characters (Unicode range
  \x00 - \x20) by replacing them with "\x" folloed by a two digit
  hex number showing its Unicode number.

Since a XML document logically consists of characters, not bytes
(Thanks Martin for reminding me ;-) the resulting unicode string is
finally encoded according to the character encoding used for producing
the physical representation of the XML document (But this is not in
scope of the guidline that should be taken into the XML-Signature 
specification ...)

Liebe Gruesse/Regards, 
---------------------------------------------------------------
DI Gregor Karlinger
mailto:gregor.karlinger@iaik.at
http://www.iaik.at
Phone +43 316 873 5541
Institute for Applied Information Processing and Communications
Austria
---------------------------------------------------------------
 

### Tom Gindin ###

[...]
>      The situation is actually even more confusing than that.  The rules
> seem to be, for HT, LF, FF and VT, something like the following:
[...]

### Merlin Hughs ###

[...]
> The escaping is useful because:
> 
>            <DName>CN=foo
>            </DName>
> 
> According to RFC 2253, this states that the common name is
> foo<LF><TAB>. However, if you trim() the text value then you
> would get common name foo. Requiring escaping of ASCII controls
> would result in CN=foo\0A\09 (if that is what is meant) which
> is unambiguous and can be safely indented, whitespace formatted
> and trim()ed. It would also eliminate a few other potential
> ambiguities:
[...]
> is significant. If we require that all significant ASCII
> controls be escaped, then trimming and formatting involving
> newlines and tabs will be safe, and meaningful whitespace in
> dnames will be explicit.

### Martin Duerst ###

An XML document consists of characters, not bytes. And in
content, each character can be escaped by a numeric character
reference. Therefore, you can have a document only encoded
in us-ascii, but will lots of &#dddd; (or &#xhhhh;); this
will transport any character in Unicode, and a browser
will display it correctly (assuming it has the right font).

> > There's another issue that seems relevant. RFC 2253 states
> > that strings must be converted to UTF-8 and then the escaping
> > rules must be applied. Do we honour this, or should we UTF-8
> > decode the RFC2253 string before embedding it in the text node.
> >
> > Essentially, should the final example in RFC 2253 be encoded
> > in XML as:
> >
> > UTF-8 encode and require ASCII escaping of high-bit-set chars:
> >   SN=Lu\C4\8Di\C4\87

This would be okay.

[...]
> > De-UTF-8 and embed the Unicode original:
> >   SN=Lu?i? (where ? is the original character)
> >
> > The last seems like the best option to me.

I agree. But it has to be spelled out clearly, and tested.

############

Received on Wednesday, 16 May 2001 09:17:33 UTC