- From: <bugzilla@jessica.w3.org>
- Date: Fri, 24 Feb 2012 11:37:51 +0000
- To: public-html@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=16106
Summary: Clarify paragraph about character references in
tokenization.html
Product: HTML WG
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: HTML5 spec (editor: Ian Hickson)
AssignedTo: ian@hixie.ch
ReportedBy: ezio.melotti@gmail.com
QAContact: public-html-bugzilla@w3.org
CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
public-html@w3.org
In the tokenization.html page, in the section "8.2.4.69 Tokenizing character
references", after the table, it says:
"""
Otherwise, return a character token for the Unicode character whose code point
is that number. If the number is in the range 0x0001 to 0x0008, 0x000E to
0x001F, 0x007F to 0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE,
0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF,
0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF,
0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF,
0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF,
then this is a parse error.
"""
As far as I understand, the character is still returned even if it's a parse
error, but this is not clear. The current wording might suggest that the
character is returned, /but/ if the number is in those ranges, then it's a
parse error (and it doesn't say what should be returned).
I suggest rephrasing it a bit to state explicitly that the character
corresponding to that value is returned in both the cases.
--
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Received on Sunday, 26 February 2012 04:55:05 UTC