- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Thu, 8 Mar 2007 09:30:46 -0500
- To: <public-xml-core-wg@w3.org>
Comments?
paul
-----Original Message-----
From: w3c-xml-cg-request@w3.org On Behalf Of Chris Lilley
Sent: Wednesday, 2007 March 07 17:37
To: XML CG
Cc: Richard Ishida; Felix Sasaki; W3C SVG Working Group
Subject: Bare surrogates in XML - must halt and catch fire?
Hello XML CG, Richard, Felix,
In XML 4th edition:
[Definition: A parsed entity contains text, a sequence of
characters, which may represent markup or character data.]
[Definition: A character is an atomic unit of text as specified by
ISO/IEC 10646:2000 [ISO/IEC 10646]. Legal characters are tab,
carriage return, line feed, and the legal characters of Unicode and
ISO/IEC 10646. The versions of these standards cited in A.1
Normative References were current at the time this document was
prepared. New characters may be added to these standards by
amendments or new editions. Consequently, XML processors MUST
accept any character in the range specified for Char. ]
http://www.w3.org/TR/xml/#charsets
This makes it clear that potentially valid characters must be
accepted. The character range is also clear:
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */
Charmod is clear about bare surrogates:
Unicode contains some code points for internal use (such as
noncharacters) or special functions (such as surrogate code points).
C079 [S] Specifications SHOULD NOT allow the use of codepoints
reserved by Unicode for internal use.
http://www.w3.org/TR/charmod/#C079
C078 [S] Specifications MUST NOT allow the use of surrogate
code points.
http://www.w3.org/TR/charmod/#C078
What is not clear is that XML specifically forbids bare surrogates
(ie, half of a surrogate pair). This came up in recent SVG WG
discussions. Is the XML parser required to reject an xml document
containing a bare surrogate? Would that be a well formedness error, or
some other sort of error?
--
Chris Lilley mailto:chris@w3.org
Interaction Domain Leader
Co-Chair, W3C SVG Working Group
W3C Graphics Activity Lead
Co-Chair, W3C Hypertext CG
Received on Thursday, 8 March 2007 14:31:55 UTC