W3C home > Mailing lists > Public > public-qt-comments@w3.org > November 2009

[Bug 8245] New: [Ser] Error for illegal characters in HTML omits some control characters

From: <bugzilla@wiggum.w3.org>
Date: Mon, 09 Nov 2009 04:08:39 +0000
To: public-qt-comments@w3.org
Message-ID: <bug-8245-523@http.www.w3.org/Bugs/Public/>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245

           Summary: [Ser] Error for illegal characters in HTML omits some
                    control characters
           Product: XPath / XQuery / XSLT
           Version: Recommendation
          Platform: All
               URL: http://www.w3.org/TR/xslt-xquery-
                    serialization/#HTML_CHARDATA
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Serialization
        AssignedTo: zongaro@ca.ibm.com
        ReportedBy: zongaro@ca.ibm.com
         QAContact: public-qt-comments@w3.org


According to section 7.3 of Serialization,[1] "Certain characters, specifically
the control characters #x7F-#x9F, are legal in XML but not in HTML. It is a
serialization error [err:SERE0014] to use the HTML output method when such
characters appear in the instance of the data model. The serializer MUST signal
the error."

The definition of the error in appendix B[2] repeats this with a slightly
different formulation:  "It is an error to use the HTML output method when
characters which are legal in XML but not in HTML, specifically the control
characters #x7F-#x9F, appear in the instance of the data model."

It is true that the control characters #x7F through #x9F were the only
characters permitted in XML 1.0 that were not permitted in HTML.  In addition,
the control characters #x01 through #x1F, excepting #x09, #xA and #xD, are
permitted in XML 1.1 (though only as character references), but not in HTML per
the SGML declaration of HTML 4.[3]


I suggest the following corrections:

. In the third paragraph of section 7.3, change "specifically the control
characters #x7F-#x9F, are legal in XML" to "specifically the control characters
#x1-#x8, #xB, #xC, #xE-#x1F and #x7F-#x9F, are legal in one or both versions of
XML, but not in HTML"

. In appendix B, in the definition of err:SER0014, change "specifically the
control characters #x7F-#x9F" to "specifically the control characters #x1-#x8,
#xB, #xC, #xE-#x1F and #x7F-#x9F"


[1] http://www.w3.org/TR/xslt-xquery-serialization/#HTML_CHARDATA
[2] http://www.w3.org/TR/xslt-xquery-serialization/#ERRSERE0014
[3] http://www.w3.org/TR/html401/sgml/sgmldecl.html


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 9 November 2009 04:08:49 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:41 UTC