- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Thu, 8 Mar 2007 09:30:46 -0500
- To: <public-xml-core-wg@w3.org>
Comments? paul -----Original Message----- From: w3c-xml-cg-request@w3.org On Behalf Of Chris Lilley Sent: Wednesday, 2007 March 07 17:37 To: XML CG Cc: Richard Ishida; Felix Sasaki; W3C SVG Working Group Subject: Bare surrogates in XML - must halt and catch fire? Hello XML CG, Richard, Felix, In XML 4th edition: [Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.] [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors MUST accept any character in the range specified for Char. ] http://www.w3.org/TR/xml/#charsets This makes it clear that potentially valid characters must be accepted. The character range is also clear: [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ Charmod is clear about bare surrogates: Unicode contains some code points for internal use (such as noncharacters) or special functions (such as surrogate code points). C079 [S] Specifications SHOULD NOT allow the use of codepoints reserved by Unicode for internal use. http://www.w3.org/TR/charmod/#C079 C078 [S] Specifications MUST NOT allow the use of surrogate code points. http://www.w3.org/TR/charmod/#C078 What is not clear is that XML specifically forbids bare surrogates (ie, half of a surrogate pair). This came up in recent SVG WG discussions. Is the XML parser required to reject an xml document containing a bare surrogate? Would that be a well formedness error, or some other sort of error? -- Chris Lilley mailto:chris@w3.org Interaction Domain Leader Co-Chair, W3C SVG Working Group W3C Graphics Activity Lead Co-Chair, W3C Hypertext CG
Received on Thursday, 8 March 2007 14:31:55 UTC