W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > March 2007

Re: FW: Bare surrogates in XML - must halt and catch fire?

From: Chris Lilley <chris@w3.org>
Date: Mon, 12 Mar 2007 04:18:42 +0100
Message-ID: <973186123.20070312041842@w3.org>
To: public-xml-core-wg@w3.org
Cc: w3c-svg-wg@w3.org

Hello public-xml-core-wg,

Richard Tobin wrote:

> (a) Does XML allow unpaired surrogates in a UTF-16 (etc) document?
> 
>     No, unpaired surrogates are not legal in UTF-16 ("ill-formed"
>     according to D35 in section 3.9 of Unicode 4.0), so by 4.3.3
>     it is a fatal error because it is "determined ... to be in a
>     certain encoding and contains byte sequences that are not legal
>     in that encoding".  Presumably the wording in that section about
>     irregular UTF-8 code unit sequences is no longer required, since
>     recent Unicode make it clear that these are ill-formed.

That is indeed the question; thanks for the clear answer. So its an
ill-formed character stream; and the parser is either not called or
(if its a streaming parser) halts with a fatal error.


-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG
Received on Monday, 12 March 2007 03:18:55 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:16:37 UTC