Re: [Moderator Action] surrogates in xml from Martin J. Duerst on 2000-10-10 (www-international@w3.org from October to December 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Tue, 10 Oct 2000 12:50:06 +0900
To: "Yves" <yves@opentag.com>, <www-international@w3.org>
Message-Id: <4.2.0.58.J.20001010124950.0357fe10@sh.w3.mag.keio.ac.jp>

At 00/10/09 23:30 -0400, Yves wrote:
>Hello Martin,
>
>Thanks, I think I understand better now:
>
>There is nothing special to do to encode surrogates for XML, we just apply
>the UTF encodings. But *once parsed*, the XML text (or tags) cannot include
>the high or low part of a surrogate as single 'charatacter'. The XML char
>definition talks about scalar values (UCS as coded character set) not
>encoded ones (encodings of UCS).
>
>And now I assume it also means we cannot have a surrogate pair coded as 2
>NCRs. For example: <U+D801,U+DC05> would be written "&#x10405;" not
>"&#xD801;&#xDC05;"?

Yes, exactly!

Regards,   Martin.

Received on Monday, 9 October 2000 23:50:39 UTC