[Bug 19931] New: Should not prefer byte order mark with UTF-8

https://www.w3.org/Bugs/Public/show_bug.cgi?id=19931

          Priority: P2
            Bug ID: 19931
          Keywords: externalComments, NE
                CC: eliotgra@microsoft.com, mike@w3.org,
                    public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org
          Assignee: eliotgra@microsoft.com
           Summary: Should not prefer byte order mark with UTF-8
        QA Contact: public-html-bugzilla@w3.org
          Severity: normal
    Classification: Unclassified
                OS: All
          Reporter: bugz.ate.my.horse@cam.n0b.org
          Hardware: All
            Status: NEW
           Version: unspecified
         Component: pre-LC1 HTML/XHTML Compat. Authoring Guide (ed: Eliot
                    Graff)
           Product: HTML WG

In the section "Specifying a Document's Character Encoding", it is stated that
polyglot markup uses UTF-8. It then says that the prefered way to indicate this
encoding is with a Byte Order Mark. 

This is not advisable I feel due to: UTF-8 not requiring a BOM [3]; that it
could cause problems with applications (apparently MSIE does or did have a
problem) and programing languages (apparently inc. Java [4][5]); it causes
otherwise valid ASCII to stop being ASCII. 

As such, I would swap the prefered method for indicating UTF inside the
document and add a note about using the BOM.

* By using <meta charset="UTF-8"/> (the HTML encoding declaration)(preferred).
* By using the Byte Order Mark (BOM) character (could cause problems in some
situations).


References: 
[1] https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
[2] https://en.wikipedia.org/wiki/UTF-8#Byte_order_mark
[3] http://www.unicode.org/faq/utf_bom.html#bom5
[4] http://bugs.sun.com/view_bug.do?bug_id=6378911
[5] http://bugs.sun.com/view_bug.do?bug_id=4508058

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Saturday, 10 November 2012 15:47:04 UTC