[Bug 7380] New: Suggest heuristic detection of UTF-8

http://www.w3.org/Bugs/Public/show_bug.cgi?id=7380

           Summary: Suggest heuristic detection of UTF-8
           Product: HTML WG
           Version: unspecified
          Platform: PC
               URL:
                    http://dev.w3.org/html5/spec/Overview.html#determining-
                    the-character-encoding
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec bugs
        AssignedTo: dave.null@w3.org
        ReportedBy: mjs@apple.com
         QAContact: public-html-bugzilla@w3.org
                CC: ian@hixie.ch, mike@w3.org, public-html@w3.org


Step 6 of the encoding detection algorithm should specifically suggest the
possibility of algorithmically detecting UTF-8. Here is some suggested wording
from the I18N WG:

"Note: The UTF-8 encoding has a highly detectable bit pattern. Documents that
contain bytes > 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents that do not match it definitely are not. While not full
autodetection, it may be appropriate for a user-agent to search for this common
encoding."


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 20 August 2009 07:29:24 UTC