- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 22 Nov 2012 03:59:24 +0100
- To: www-international@w3.org
As and additional point to (II.), I would propose that you change the
title "Removing the BOM" to "Adding or removing the BOM" and rewrite
that section accordingly.
Leif H Silli
Leif Halvard Silli, Thu, 22 Nov 2012 03:33:57 +0100:
> Anne van Kesteren, Wed, 21 Nov 2012 22:04:22 +0100:
>> I saw http://www.w3.org/International/questions/new/qa-byte-order-mark-new
>> in the minutes.
>
> I have no objections to Anne’s comments. Especially that the BOM
> overrides anything else, is important. But instead of removing the
> warnings, perhaps you could say that, as of today, not yet all HTML UAs
> let the BOM override the HTTP. Also, of course, one should not
> encourage anyone to make BOM and HTTP disagree!
>
> Here are some comments of my own:
>
> (I.) While I often speak well about the BOM, I heard a good, critical
> comment from Martin, in the Unicode mailing list this summer: [1]
>
> "The problem with the BOM in UTF-8 is that it can be quite
> helpful (for quickly distinguishing between UTF-8 and
> legacy-encoded files) and quite damaging (for programs that use
> the Unix/Linux model of text processing), and that's why it
> creates so much controversy."
>
> This informative note would be a good statement to include, directly
> or edited - e.g. when you start to describe the problems of the BOM.
> (My hunch is, as well, that the "linux model of text processing" is
> ultimately one reason why PHP doesn't handle the BOM so well.)
>
> (II.) Positivity! The page tells much about disadvantages of the BOM.
> Could you please also describe some advantages to including the BOM?
> Speaking about the UTF-8 BOM, then those advantages are
>
> a) It is an UTF-8 _signature_ - thus it prevents the page from
> defaulting to to - well - the default encoding,
> b) It has effect in both XML/XHTML and HTML.
> c) It is small/short,
> d) It is very safe: Per Anne's Encoding spec - as well as implemented
> in IE (I have not tested released IE10), Webkit and (as promised by
> Henri) upcoming versions of Firefox (and since Anne wrote it, I must
> assume in Opera too), it is impossible to - by accident or otherwise -
> override the encoding of pages that include the BOM. NOTE: Accidental
> overriding can happens as a side effect of overriding the current page
> since HTML browsers - to various degree - remember manual encoding
> overriding also for other pages that you open in the same Tab/Window.
> If you like, you could as well add that these advantages are not as
> important for XML documents, since the ultimately defaults to UTF-8
> anyhow.
>
> (III.) Under the subheading 'Quirks mode in Internet Explorer' (beneath
> 'Potential issues with the UTF-8 BOM'[2]), please replace 'Internet
> Explorer 6' with 'Internet Explorer 5.5'. (I verified - again - today,
> using the fine service as http://netrenderer.de.) (If one follows the
> link to the article on 'Serving HTML & XHTML', then you already makes
> clear that IE6 is _not_ affected:[3] "With Internet Explorer 6,
> however, if anything other than a byte-order mark appears before the
> DOCTYPE declaration the page is rendered in quirks mode." You should
> bring the new BOM article in alignment with that.)
>
> (IV.) Under the subheading 'Transcoding', it is said:
>
> "If you change the encoding of a UTF-8 file from a Unicode encoding
> to something else, you must ensure that the BOM is removed.
>
> If you don't either the browser will continue to treat your
> content
> as UTF-8, or you will see strange characters at the beginning of
> the page."
>
> Remarks. To say "If you change the encoding of a UTF-8 file from a
> Unicode encoding to something else", sounds strange, for
> various reasons:
> a) It is obvious that a 'UTF-8 file' is using a "Unicode encoding'.
> b) 'non-Unicode encoding' is better than 'something else'.
> Suggested reformulation: "If you change the encoding of a
> Unicode encoded file to a non-Unicode encoding, then …".
>
> (V.) Also, regarding the sentence that goes, quote: "You should also
> be aware that, although ASCII is a subset of UTF-8, a file that starts
> with a BOM is no longer ASCII-compatible." Here I would propose to
> change "a file that starts [etc]" with "an otherwise ASCII encoded file
> that starts with a BOM is no longer ASCII-compatible".
>
> But it is tempting to add that it can also be ADVANTAGE that the
> BOM this way makes the page ASCII-incompatible. Just imagine: A simple
> BOM, and voila, we are in Unicode land rather than in ISO-8859-1 land.
> Because ASCII is interpreted as ISO-8859-1 - and friends - on the Web.
> (Yes, if you declare the page to be ASCII, the browser still interprets
> it as Latin-1.) Thus, a ASCII encoded page on the Web is, strictly
> speaking, not ASCII-compatible! But for the BOM, it would - from that
> angle - be more ASCII-compatible if you *added* the BOM. This e.g.
> matters if the page accepts input form the user (via a form). Thus,
> essentially, we are back at the ADVANTAGES of the BOM. Strictly
> speaking, if the BOM creates a probllem with regard to
> ASCII-compatibility, then we are at the subject of *transcoding*, which
> should be a rare and academical rehearsal! See below.
>
> (VI.) Also, it seems like "Sometimes the encoding of a file is changed
> ('transcoded')" should be moved to right under the subheading
> 'Transcoding'.
>
> (VII.) And I think the Transcoding section could do well in
> dis-recommending to transcode Unicode/UTF-8 encoded documents. And
> thus, in that connection, you could add that section on transcoding
> relates to rare/academic situations.
>
> (VIII.) Btw, the current text also seems to pre-assume that the reader
> knows that he/she must - in addition to removing the BOM, *also*
> replace the BOM with a (correct) <meta> charset declaration etc. I
> think you should not pre-assume that! You do too much fuss out of the
> problems of the BOM here, I feel …
>
> [1] <http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0333.html>
> [2]
>
<http://www.w3.org/International/questions/new/qa-byte-order-mark-new.en.php#problems>
> [3]
> <http://www.w3.org/International/articles/serving-xhtml/#declaration>
> --
> leif halvard silli
Received on Thursday, 22 November 2012 02:59:53 UTC