- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 22 Nov 2012 03:59:24 +0100
- To: www-international@w3.org
As and additional point to (II.), I would propose that you change the title "Removing the BOM" to "Adding or removing the BOM" and rewrite that section accordingly. Leif H Silli Leif Halvard Silli, Thu, 22 Nov 2012 03:33:57 +0100: > Anne van Kesteren, Wed, 21 Nov 2012 22:04:22 +0100: >> I saw http://www.w3.org/International/questions/new/qa-byte-order-mark-new >> in the minutes. > > I have no objections to Anne’s comments. Especially that the BOM > overrides anything else, is important. But instead of removing the > warnings, perhaps you could say that, as of today, not yet all HTML UAs > let the BOM override the HTTP. Also, of course, one should not > encourage anyone to make BOM and HTTP disagree! > > Here are some comments of my own: > > (I.) While I often speak well about the BOM, I heard a good, critical > comment from Martin, in the Unicode mailing list this summer: [1] > > "The problem with the BOM in UTF-8 is that it can be quite > helpful (for quickly distinguishing between UTF-8 and > legacy-encoded files) and quite damaging (for programs that use > the Unix/Linux model of text processing), and that's why it > creates so much controversy." > > This informative note would be a good statement to include, directly > or edited - e.g. when you start to describe the problems of the BOM. > (My hunch is, as well, that the "linux model of text processing" is > ultimately one reason why PHP doesn't handle the BOM so well.) > > (II.) Positivity! The page tells much about disadvantages of the BOM. > Could you please also describe some advantages to including the BOM? > Speaking about the UTF-8 BOM, then those advantages are > > a) It is an UTF-8 _signature_ - thus it prevents the page from > defaulting to to - well - the default encoding, > b) It has effect in both XML/XHTML and HTML. > c) It is small/short, > d) It is very safe: Per Anne's Encoding spec - as well as implemented > in IE (I have not tested released IE10), Webkit and (as promised by > Henri) upcoming versions of Firefox (and since Anne wrote it, I must > assume in Opera too), it is impossible to - by accident or otherwise - > override the encoding of pages that include the BOM. NOTE: Accidental > overriding can happens as a side effect of overriding the current page > since HTML browsers - to various degree - remember manual encoding > overriding also for other pages that you open in the same Tab/Window. > If you like, you could as well add that these advantages are not as > important for XML documents, since the ultimately defaults to UTF-8 > anyhow. > > (III.) Under the subheading 'Quirks mode in Internet Explorer' (beneath > 'Potential issues with the UTF-8 BOM'[2]), please replace 'Internet > Explorer 6' with 'Internet Explorer 5.5'. (I verified - again - today, > using the fine service as http://netrenderer.de.) (If one follows the > link to the article on 'Serving HTML & XHTML', then you already makes > clear that IE6 is _not_ affected:[3] "With Internet Explorer 6, > however, if anything other than a byte-order mark appears before the > DOCTYPE declaration the page is rendered in quirks mode." You should > bring the new BOM article in alignment with that.) > > (IV.) Under the subheading 'Transcoding', it is said: > > "If you change the encoding of a UTF-8 file from a Unicode encoding > to something else, you must ensure that the BOM is removed. > > If you don't either the browser will continue to treat your > content > as UTF-8, or you will see strange characters at the beginning of > the page." > > Remarks. To say "If you change the encoding of a UTF-8 file from a > Unicode encoding to something else", sounds strange, for > various reasons: > a) It is obvious that a 'UTF-8 file' is using a "Unicode encoding'. > b) 'non-Unicode encoding' is better than 'something else'. > Suggested reformulation: "If you change the encoding of a > Unicode encoded file to a non-Unicode encoding, then …". > > (V.) Also, regarding the sentence that goes, quote: "You should also > be aware that, although ASCII is a subset of UTF-8, a file that starts > with a BOM is no longer ASCII-compatible." Here I would propose to > change "a file that starts [etc]" with "an otherwise ASCII encoded file > that starts with a BOM is no longer ASCII-compatible". > > But it is tempting to add that it can also be ADVANTAGE that the > BOM this way makes the page ASCII-incompatible. Just imagine: A simple > BOM, and voila, we are in Unicode land rather than in ISO-8859-1 land. > Because ASCII is interpreted as ISO-8859-1 - and friends - on the Web. > (Yes, if you declare the page to be ASCII, the browser still interprets > it as Latin-1.) Thus, a ASCII encoded page on the Web is, strictly > speaking, not ASCII-compatible! But for the BOM, it would - from that > angle - be more ASCII-compatible if you *added* the BOM. This e.g. > matters if the page accepts input form the user (via a form). Thus, > essentially, we are back at the ADVANTAGES of the BOM. Strictly > speaking, if the BOM creates a probllem with regard to > ASCII-compatibility, then we are at the subject of *transcoding*, which > should be a rare and academical rehearsal! See below. > > (VI.) Also, it seems like "Sometimes the encoding of a file is changed > ('transcoded')" should be moved to right under the subheading > 'Transcoding'. > > (VII.) And I think the Transcoding section could do well in > dis-recommending to transcode Unicode/UTF-8 encoded documents. And > thus, in that connection, you could add that section on transcoding > relates to rare/academic situations. > > (VIII.) Btw, the current text also seems to pre-assume that the reader > knows that he/she must - in addition to removing the BOM, *also* > replace the BOM with a (correct) <meta> charset declaration etc. I > think you should not pre-assume that! You do too much fuss out of the > problems of the BOM here, I feel … > > [1] <http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0333.html> > [2] > <http://www.w3.org/International/questions/new/qa-byte-order-mark-new.en.php#problems> > [3] > <http://www.w3.org/International/articles/serving-xhtml/#declaration> > -- > leif halvard silli
Received on Thursday, 22 November 2012 02:59:53 UTC