- From: Richard Tobin <richard@inf.ed.ac.uk>
- Date: Thu, 8 Mar 2007 15:56:00 +0000 (GMT)
- To: "Grosso, Paul" <pgrosso@ptc.com>, <public-xml-core-wg@w3.org>
> What is not clear is that XML specifically forbids bare surrogates > (ie, half of a surrogate pair). This came up in recent SVG WG > discussions. Is the XML parser required to reject an xml document > containing a bare surrogate? Would that be a well formedness error, or > some other sort of error? I'm not sure what the question means. Here are two possibilities: (a) Does XML allow unpaired surrogates in a UTF-16 (etc) document? No, unpaired surrogates are not legal in UTF-16 ("ill-formed" according to D35 in section 3.9 of Unicode 4.0), so by 4.3.3 it is a fatal error because it is "determined ... to be in a certain encoding and contains byte sequences that are not legal in that encoding". Presumably the wording in that section about irregular UTF-8 code unit sequences is no longer required, since recent Unicode make it clear that these are ill-formed. (b) Does XML allow characters whose code point is that of a surrogate? No, because it would violate production 2. -- Richard
Received on Thursday, 8 March 2007 15:56:17 UTC