W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

Re: For review: The byte-order mark (BOM) in HTML

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Tue, 18 Dec 2012 14:57:31 -0800
Message-ID: <50D0F4DB.3000206@ix.netcom.com>
To: Richard Ishida <ishida@w3.org>
CC: www International <www-international@w3.org>
The text says


          What is a byte-order mark?
          <http://www.w3.org/International/questions/new/qa-byte-order-mark-new.en.php#bomwhat>

    At the beginning of a page that uses a Unicode
    <http://www.w3.org/International/articles/definitions-characters/Overview#unicode>
    character encoding
    <http://www.w3.org/International/articles/definitions-characters/Overview#charsets>
    you may find some bytes that represent the Unicode code point U+FEFF
    ZERO WIDTH NO-BREAK SPACE (ZWNBSP). This combination of bytes is
    known as a byte-order mark (BOM).

    The BOM, when correctly used, is invisible.

For a while now, there's been a formal name alias defined for the Byte 
order mark, Actually two, if you count the abbreviation. (See: 
http://www.unicode.org/Public/UNIDATA/NameAliases.txt)

FEFF;BYTE ORDER MARK;alternate
FEFF;BOM;abbreviation

Section 4.8 of the Unicode Standard explains that these aliases are 
designed (like the original character names) to be used as identifiers 
(e.g. in specifications, regular expressions etc.).

With the introduction of U+2060 WORD JOINER, there's no longer a need to 
ever use FEFF for its ZWNSP effect, so from that point on, and with the 
availability of a formal alias, the name ZERO WIDTH NO-BREAK SPACE just 
represents baggage.

I recommend that the original name, if mentioned, be relegated to the 
status of a historical footnote.

A./





On 12/18/2012 10:09 AM, Richard Ishida wrote:
> Comments are requested on the following proposed update of the article 
> The byte-order mark (BOM) in HTML[1] prior to final publication. NOTE 
> THAT the article is in a temporary location, and will be moved to its 
> final location after the review.
>
> The majority of the article has been rewritten, with the aim of 
> reducing the previous warnings against using the BOM for UTF-8 
> documents. Also taken into account is the change to the HTML5 spec 
> that raises the precedence of the BOM versus the HTTP header in terms 
> of character encoding declaration.
>
> Please send any comments over the next two weeks to this list 
> (www-international).
>
> We hope to publish a final version at the beginning of the New Year.
>
>
> [1] 
> http://www.w3.org/International/questions/new/qa-byte-order-mark-new.en.php
>
>
>
>
>
Received on Tuesday, 18 December 2012 22:58:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 18 December 2012 22:58:03 GMT