- From: <bugzilla@jessica.w3.org>
- Date: Mon, 01 Aug 2011 14:03:13 +0000
- To: public-i18n-core@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=13392 --- Comment #12 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-08-01 14:03:11 UTC --- COMPROMISE PROPOSAL: * The text "(preferred)" was Eliot's addition which, however, was quite compatible with the arguments I presented along with my original spec text proposal - as such I endorsed it/did not speak against it. * But I am personally happy to state the facts and let the authors draw the conclusions themselves. As such, I can see that the current text - with its "(preferred)" - states a preference without proper justification within the spec. Hence, instead of the I18N Group's proposed change, I would like to suggest the following, which helps the reader's understanding more: 1) REPLACE: "By using the Byte Order Mark (BOM) character (preferred)." WITH: "By using the Byte Order Mark (BOM) character, which is an encoding signature that both XML and HTML parsers are required to support." <!--NOTE: the phrase 'encoding signature' stems from XML 1.0 http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding --> 2) REPLACE: "By using <meta charset="UTF-8"/> (the HTML encoding declaration)." WITH: "By using <meta charset="UTF-8"/> (the HTML encoding declaration) and thus, for XML parsers, rely on XML´s encoding default (see above)." <!--NOTE: 'XML´s encoding default' is explained in the spec, one para above - and was also in my original proposal, see bug 12062. --> The above changes states the facts about each method, in a minimum ammount of text. Now, some replies to the I18N Group, to Henri and to Addison: Reply to Comment #9 - the I18N Group: It will be great to see the PHP test - and I don't mind putting it in the spec somewhere as long as we can also mention the problems of the <meta charset="UTF-8"/> method. For my own part, I use a PHP based CMS where I had no problems adding the BOM. Reply to Henri - Comment #10: > ...it doesn't follow that Polyglot Markup should > then promote things you like within the subset. The polyglot facts/subset/principles says that a) the XML and HTML DOMs should be identical, b) the syntax should be legal and neccessary in HTML and XML It follows that a feature (the BOM) that has the same effect in both XML and HTML, is a stricter subset of XML and HTML than an feature that has effect only in HTML (and which need the HTML5 spec´s "permission" to appear in the XML serialization). In reply to Addision - Comment #11 > I don't think the argument is that BOM should be removed altogether. Nontheless, bug 12062, which is the basis for what the spec currently says, was titled "UTF-8 BOM should not be forbidden in Polyglot Markup". Because per 14th of February this year, the BOM for some reason was forbidden (I wonder if the I18N Group had a finger in that). > What the I18N WG is asking for is that it not (perhaps erroneously) > be considered the "preferred" option. Perhaps it was an error of you to suggest that it might have been an error? ;-) I don't see that it creates problems for anyone - not even for those tools which do not support the BOM. But if you can live with my compromise proposal, then I don't need to defend "(preferred)". However, I would like to point out that the current spec text explicitely does *not* state that one should only use one of the - several - encoding declararation options. Instead the spec says: "… in the following ways, which may be used separately or in combination: …" >From my POV, "(preferred)" is a recommendation to use the BOM - nothing more or less. Thus it is would not, as the spec stands, have been a spec viloation to not use the BOM or to combine it with the visible declarataion or HTTP. (From my POV, I think I would alway - at least for HTML - include both an external encoding declaration in HTTP as well as at least one internal - currently that seems necessary in order to be on the safe side. But the Polyglot Spec currently does not deal with such detailed advice.Should it?) > Leaving aside whether this or that browser or tool responds well to BOM, We cannot completely leave that aside, when you yourself brings in (half truth) claims about negative effects. > the > BOM is invisible when properly handled and a problem when visible. Did you mean '… and visible if not properly handled.' ? For the record: Opera has a bug in which it swallows the BOM even if the page is ISO-8859-1 encoded. Thus, it is also a problem to not make it visible, when it should be visible. > Visible > encoding declarations (when correct) make page encoding easier to > work with for humans. >From my POV, there is enough 'visible' notifications of the encoding: browsers report the encoding in one of its menus. And editors reports the encoding in a toolbar or otherwise. And they all also tend to read the BOM as an encoding declaration. Still I have not protested against the fact that the Polyglot spec points to the i18n group's recommendation to use visible encoding declarations. It is, as I understand it, not endorsed by the spec - it is just so that spec cites the i18n group's claim that it is helpful, and lets the author decides whether this consideration is something he or she wants to take ad notam. This is OK with me also despite the fact that I think it is an advantage to, when possible, only declare the encoding once. Because when something has to be declared more than once, then there is always risk that the multiple declarations get out of sync. > Specifying which one has priority and how to interpret each is the job of > Polyglot, but the "preferred" is unnecessary and may actually depend on the > user's tools and environment. The BOM is not the only feature that relies on the tools and the environment. All methods - the HTTP charset, the BOM and the meta charset element - depend on the those factors. E.g. I have more than once been using editors which did not understand the new, HTML5 <meta charset=charset > declaration element. Examples: * The HTML parser inside XMLLib2 (try xmllint on the command line) does not understand <meta charset="UTF-8"/> but does instead default to ISO-8859-1. *However*, if the document includes the BOM *or* the legacy <meta@content-type> encoding declaration, xmllib2's HTML parser still succeeds in detecting the encoding as UTF-8. * The iCab web browser, before it switched to using Webkit, supported the BOM, but did not support <meta charset="UTF-8"/>. * I'm sure I could find several browsers and tools more. Thus, the BOM has sometimes better support than the new meta charset element. This because BOM is both XML-compatible as well as HTML-compatible *and* because it is older than the HTML5 encoding declaration. Question: Why isn't the lack of back-compatibility for the new <meta charset="UTF-8"/> a concern for you? Note that Polyglot spec currently says that polyglot documents do not use the legacy encoding declaration (despite that it is fully tolerated, withou any warning, to use it in non-polyglot HTML). Perhaps even Polyglot Markup should tolerate the legacy <meta@http-equiv> variant? PS: I don't want to hide facts: I sofar know about 3 parsers which makes the BOM visible: the textbrowsers Lynx inserts an empty paragraph on top of the page. Elinks inserts an empty *line* on top of the page. While Links inserts a paragraph with a <unknown> character inside. Also, the very outdated IE5.x for Mac behaves similar to Links, but without going into quirks mode. (For the record: the text browsers netrik and w3m do not have this problem.) -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug.
Received on Monday, 1 August 2011 14:03:23 UTC