Re: AW: AW: AW: KeyName white space

At 15:14 01/05/15 +0200, Gregor Karlinger wrote:

[slightly reordered]

>Doesn't this mean that you MUST use UTF-8 encoding for the XML
>document, i. e. that it is impossible to use ASCII or ISO_8859*
>since you insert a text node (the DN) which characters are Unicode?
>
>If so, should we really introduce such a restriction?

An XML document consists of characters, not bytes. And in
content, each character can be escaped by a numeric character
reference. Therefore, you can have a document only encoded
in us-ascii, but will lots of &#dddd; (or &#xhhhh;); this
will transport any character in Unicode, and a browser
will display it correctly (assuming it has the right font).

> > There's another issue that seems relevant. RFC 2253 states
> > that strings must be converted to UTF-8 and then the escaping
> > rules must be applied. Do we honour this, or should we UTF-8
> > decode the RFC2253 string before embedding it in the text node.
> >
> > Essentially, should the final example in RFC 2253 be encoded
> > in XML as:
> >
> > UTF-8 encode and require ASCII escaping of high-bit-set chars:
> >   SN=Lu\C4\8Di\C4\87

This would be okay.


> > UTF-8 encode and embed the result directly:
> >   SN=Lu??i?? (where ? is a high-bit UTF-8 byte directly embedded)
> >   (Here the meaning is confusing because the UTF-8 encoded
> >    text will correspond to some other Unicode charactes, e.g. ト)

This would be a catastrophe. You cannot put arbitrary byte
sequences into and XML document. UTF-8 for example uses
bytes in the range 0x80-0x9F, which are illegal is iso-8859-X.
Such a file would have to be rejected by the XML parser.


> > De-UTF-8 and embed the Unicode original:
> >   SN=Lu?i? (where ? is the original character)
> >
> > The last seems like the best option to me.

I agree. But it has to be spelled out clearly, and tested.


Regards,   Martin.

Received on Tuesday, 15 May 2001 11:37:18 UTC