W3C home > Mailing lists > Public > www-validator-cvs@w3.org > March 2007

[Bug 3164] non SGML character number 128-159

From: <bugzilla@wiggum.w3.org>
Date: Sat, 24 Mar 2007 02:04:13 +0000
CC:
To: www-validator-cvs@w3.org
Message-Id: <E1HUvc5-0002dP-F3@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3164





------- Comment #6 from cmsmcq@w3.org  2007-03-24 02:04 -------
The characters in question are indeed allowed by the XML 1.0 specification
(although in 1.1, I believe they are allowed only in the form of character
references, not as literals).  There appear to be discrepancies in every
SGML declaration I've found which claims to represent XML in SGML terms: they
all declare these as UNUSED characters (i.e. non-SGML characters, not
to appear literally).

But are they allowed by XHTML 1.0?  XHTML 1.0 describes itself as a
reformulation in XML of HTML 4.  And HTML 4 includes an SGML declaration
(which I believe to be normative) which excludes these characters.

http://www.w3.org/TR/1999/REC-html401-19991224/sgml/sgmldecl.html

The relevant part of the document character set declaration in the HTML 4
SGML declaration reads:

                 127     1       UNUSED
                 128     32      UNUSED

If the character-repertoire restrictions of HTML 4 are inherited by
XHTML 1.0, then I think the validator is right to reject these characters.

Further discussion and details of this logic may be found at
http://www.w3.org/People/cmsmcq/2007/C1.xml
Received on Saturday, 24 March 2007 02:04:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 20 September 2007 14:33:16 GMT