[Bug 16106] New: Clarify paragraph about character references in tokenization.html

https://www.w3.org/Bugs/Public/show_bug.cgi?id=16106

           Summary: Clarify paragraph about character references in
                    tokenization.html
           Product: HTML WG
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: ezio.melotti@gmail.com
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org


In the tokenization.html page, in the section "8.2.4.69 Tokenizing character
references", after the table, it says:

"""
Otherwise, return a character token for the Unicode character whose code point
is that number. If the number is in the range 0x0001 to 0x0008, 0x000E to
0x001F, 0x007F to 0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE,
0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF,
0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF,
0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF,
0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF,
then this is a parse error.
"""

As far as I understand, the character is still returned even if it's a parse
error, but this is not clear.  The current wording might suggest that the
character is returned, /but/ if the number is in those ranges, then it's a
parse error (and it doesn't say what should be returned).
I suggest rephrasing it a bit to state explicitly that the character
corresponding to that value is returned in both the cases.

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 24 February 2012 11:37:51 UTC