W3C home > Mailing lists > Public > www-international@w3.org > October to December 2000

Re: [Moderator Action] surrogates in xml

From: Martin J. Duerst <duerst@w3.org>
Date: Tue, 10 Oct 2000 12:50:06 +0900
Message-Id: <4.2.0.58.J.20001010124950.0357fe10@sh.w3.mag.keio.ac.jp>
To: "Yves" <yves@opentag.com>, <www-international@w3.org>
At 00/10/09 23:30 -0400, Yves wrote:
>Hello Martin,
>
>Thanks, I think I understand better now:
>
>There is nothing special to do to encode surrogates for XML, we just apply
>the UTF encodings. But *once parsed*, the XML text (or tags) cannot include
>the high or low part of a surrogate as single 'charatacter'. The XML char
>definition talks about scalar values (UCS as coded character set) not
>encoded ones (encodings of UCS).
>
>And now I assume it also means we cannot have a surrogate pair coded as 2
>NCRs. For example: <U+D801,U+DC05> would be written "&#x10405;" not
>"&#xD801;&#xDC05;"?

Yes, exactly!

Regards,   Martin.
Received on Monday, 9 October 2000 23:50:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT