W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > March 2011

[Bug 12062] UTF-8 BOM should not be forbidden in Polyglot Markup

From: <bugzilla@jessica.w3.org>
Date: Thu, 03 Mar 2011 22:47:00 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1PvHIS-0005t0-Uo@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12062

Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |

--- Comment #6 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-03-03 22:46:59 UTC ---
(In reply to comment #5)

I'm very satisfied with your solution. However, I would like to add some more
issues relating to the section we are discussing. 

ISSUE 1:

Draft says: ]] polyglot markup does not use <meta content=”text/html;
charset”>[[

Please type a valid http-equiv meta element:   <meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />

ISSUE 2 (which evenetually cancels ISSUE 1): You could simply delete this
sentence: ]] However, because the mime-type is not necessarily text/html,
polyglot markup does not use <meta content=”text/html; charset”>.]]

Justification: I think I am the source for the above sentence. However, it
seems to be to be "over-thought" or "too smart". It only seeks to explain why
HTML5 does not permit the http-equiv="content-type" element, and the explanatio
sounds logical enough. However, it is, perhaps, not for us to _speculate_ about
why it is not permitted? Do as you wish. But I think I would have deleted it.

ISSUE 3 - this is more substantial:

Draft says: ]] when polyglot markup uses UTF-16, it includes the BOM indicating
little-endian UTF-16 or big-endian UTF-16 [[

The above quote has two problems:
  PROBLEM I: you say "the BOM". Please say "a BOM".
  PROBLEM II: please say "MUST" rather than "it includes". The MUST is taken
from XML 1.0 - which says that a BOM is required when UTF-16 is used.

SOLUTION: This is the reformulation that I suggest:

]] when polyglot markup uses UTF-16, a BOM (indicating little-endian UTF-16 or
big-endian UTF-16) MUST be used. [[

(I don't know if you want to point to XML 1.0  with regard to the MUST:
http://www.w3.org/TR/REC-xml/#charencoding .)

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 3 March 2011 22:47:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 3 March 2011 22:47:05 GMT