[Bug 15192] New: section 8.1.4 Character references; section 8.2.2.2 Character encodings In section 8.2.2.2, we say, "User agents must at a minimum support the UTF-8 and Windows-1252 encodings, but may support more." In section 8.1.4, we say, "The numeric character refere from bugzilla@jessica.w3.org on 2011-12-15 (public-html-bugzilla@w3.org from December 2011)

From: <bugzilla@jessica.w3.org>
Date: Thu, 15 Dec 2011 00:28:52 +0000
To: public-html-bugzilla@w3.org
Message-ID: <bug-15192-2486@http.www.w3.org/Bugs/Public/>

https://www.w3.org/Bugs/Public/show_bug.cgi?id=15192

           Summary: section 8.1.4 Character references; section 8.2.2.2
                    Character encodings In section 8.2.2.2, we say, "User
                    agents must at a minimum support the UTF-8 and
                    Windows-1252 encodings, but may support more." In
                    section 8.1.4, we say, "The numeric character refere
           Product: HTML WG
           Version: unspecified
          Platform: Other
               URL: http://www.whatwg.org/specs/web-apps/current-work/#top
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P3
         Component: HTML5 spec (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: contributor@whatwg.org
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org


Specification: http://www.w3.org/TR/2011/WD-html5-20110525/
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

Comment:
section 8.1.4 Character references; section 8.2.2.2 Character encodings

In section 8.2.2.2, we say, "User agents must at a minimum support the UTF-8
and Windows-1252 encodings, but may support more."

In section 8.1.4, we say, "The numeric character reference forms described
above are allowed to reference any Unicode code point other than U+0000,
U+000D, permanently undefined Unicode characters (noncharacters), and control
characters other than space characters."

What about the characters in the range 0x80 to 0x9F, which in Windows-1252
encodings are replaced with printable characters?

For example, am I allowed to use a Windows-1252 codepoint, "&#x80;", to
reference the Euro character, "&#x20AC;"? Does the browser have to further
interpret strings after replacing character references?

I suggest we add a note to 8.1.4 Character references:
"The numeric character references are to Unicode code points, so instead of
using character references in the range of &#x80; to &#x9F; from the
Windows-1252 encoding, use the appropriate Unicode character. Instead of using
character references in the range of &#D800; to &#DFFF; as surrogate pairs
from the UTF-16 encoding, use the appropriate Unicode character."


Posted from: 96.53.31.86
User agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64;
Trident/5.0)

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 15 December 2011 00:28:59 UTC