[Bug 3164] non SGML character number 128-159

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3164





------- Comment #6 from cmsmcq@w3.org  2007-03-24 02:04 -------
The characters in question are indeed allowed by the XML 1.0 specification
(although in 1.1, I believe they are allowed only in the form of character
references, not as literals).  There appear to be discrepancies in every
SGML declaration I've found which claims to represent XML in SGML terms: they
all declare these as UNUSED characters (i.e. non-SGML characters, not
to appear literally).

But are they allowed by XHTML 1.0?  XHTML 1.0 describes itself as a
reformulation in XML of HTML 4.  And HTML 4 includes an SGML declaration
(which I believe to be normative) which excludes these characters.

http://www.w3.org/TR/1999/REC-html401-19991224/sgml/sgmldecl.html

The relevant part of the document character set declaration in the HTML 4
SGML declaration reads:

                 127     1       UNUSED
                 128     32      UNUSED

If the character-repertoire restrictions of HTML 4 are inherited by
XHTML 1.0, then I think the validator is right to reject these characters.

Further discussion and details of this logic may be found at
http://www.w3.org/People/cmsmcq/2007/C1.xml

Received on Saturday, 24 March 2007 02:04:20 UTC