Re: AW: AW: AW: KeyName white space from Martin Duerst on 2001-05-15 (w3c-ietf-xmldsig@w3.org from April to June 2001)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 16 May 2001 00:35:13 +0900
To: "Gregor Karlinger" <gregor.karlinger@iaik.at>, <merlin@baltimore.ie>, "Tom Gindin" <tgindin@us.ibm.com>
Cc: <w3c-ietf-xmldsig@w3.org>
Message-Id: <4.2.0.58.J.20010516002903.03787ca0@sh.w3.mag.keio.ac.jp>

At 15:14 01/05/15 +0200, Gregor Karlinger wrote:

[slightly reordered]

>Doesn't this mean that you MUST use UTF-8 encoding for the XML
>document, i. e. that it is impossible to use ASCII or ISO_8859*
>since you insert a text node (the DN) which characters are Unicode?
>
>If so, should we really introduce such a restriction?

An XML document consists of characters, not bytes. And in
content, each character can be escaped by a numeric character
reference. Therefore, you can have a document only encoded
in us-ascii, but will lots of &#dddd; (or &#xhhhh;); this
will transport any character in Unicode, and a browser
will display it correctly (assuming it has the right font).

> > There's another issue that seems relevant. RFC 2253 states
> > that strings must be converted to UTF-8 and then the escaping
> > rules must be applied. Do we honour this, or should we UTF-8
> > decode the RFC2253 string before embedding it in the text node.
> >
> > Essentially, should the final example in RFC 2253 be encoded
> > in XML as:
> >
> > UTF-8 encode and require ASCII escaping of high-bit-set chars:
> >   SN=Lu\C4\8Di\C4\87

This would be okay.

> > UTF-8 encode and embed the result directly:
> >   SN=Lu??i?? (where ? is a high-bit UTF-8 byte directly embedded)
> >   (Here the meaning is confusing because the UTF-8 encoded
> >    text will correspond to some other Unicode charactes, e.g. ト)

This would be a catastrophe. You cannot put arbitrary byte
sequences into and XML document. UTF-8 for example uses
bytes in the range 0x80-0x9F, which are illegal is iso-8859-X.
Such a file would have to be rejected by the XML parser.

> > De-UTF-8 and embed the Unicode original:
> >   SN=Lu?i? (where ? is the original character)
> >
> > The last seems like the best option to me.

I agree. But it has to be spelled out clearly, and tested.

Regards,   Martin.

Received on Tuesday, 15 May 2001 11:37:18 UTC