Re: [css-syntax] ISSUE-329: @charset has no effect on stylesheet?? from Simon Pieters on 2014-01-24 (www-style@w3.org from January 2014)

From: Simon Pieters <simonp@opera.com>
Date: Fri, 24 Jan 2014 09:58:53 +0100
To: "www-style list" <www-style@w3.org>, "www International" <www-international@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, "Zack Weinberg" <zackw@panix.com>
Message-ID: <op.w96okftridj3kv@simons-macbook-pro.local>
On Thu, 23 Jan 2014 21:35:28 +0100, Zack Weinberg <zackw@panix.com> wrote:

> #### Summary of how style sheet encoding is determined
>
>> This section is non-normative.
>
> [UTF-8]() is the default character encoding for CSS.

I think this is a confusing statement. It sounds like if you don't specify  
an encoding, you get utf-8.

> The use of
> [UTF-8]() for new style sheets is mandated by [[ENCODING]]().  When
> legacy requirements dictate the use of some other encoding, either for
> the style sheet or some or all of its referring documents, authors may
> set the encoding as follows:

The list applies for utf-8 also. Also 'may' is inappropriate for a  
non-normative section.

>  * The network protocol (e.g. HTTP) may supply an encoding for the
>    character sheet as metadata; when available, use of this mechanism
>    is preferred.  New content encoded in [UTF-8]() should be marked as
>    such using this mechanism.

Why is this the preferred mechanism for utf-8 (but not for other  
encodings?)?

>  * ASCII-compatible encodings may also be declared in-band by use of
>    an [@charset directive]().  This directive is ignored if the
>    network protocol supplies an encoding as metadata.
>
>    > Warning: Although an [@charset directive]() textually resembles
>    > an [at-rule](), it is not parsed as an at-rule; only a specific
>    > byte sequence, beginning with the very first byte in the style
>    > sheet, is accepted.
>
>  * The referring document provides, explicitly or implicitly, an
>    [environment encoding]() which is assumed to apply to the style
>    sheet if neither of the above mechanisms provide an encoding.
>    Relying on the environment encoding is discouraged.

Why is it discouraged?

>  * [UTF-16]() encoding, which is not ASCII-compatible, may be declared
>    out-of-band with network data or in-band with a [byte order mark](),
>    but not with a [@charset directive]().  The use of [UTF-16]() is
>    **strongly discouraged**.

Why is it more strongly discouraged than other non-utf-8 encodings? Since  
utf-8 is already must, I think it doesn't make sense to discourage other  
specific encodings.

>    When present, a [byte order mark]() overrides any encoding set by
>    network metadata, as specified in [[ENCODING]]().
>
>  * ASCII-incompatible encodings other than [UTF-16]() may not be
>    used, as specified in [[ENCODING]]().
>
> #### Algorithm for determining the fallback encoding
>
> The [decode]() algorithm takes as input a <dfn>fallback
> encoding</dfn>, which UAs shall determine as follows:
>
>> Note: The [decode]() algorithm uses the [fallback encoding]() only
>> when no [byte order mark]() is present in the input.
>
> 1. If HTTP or equivalent protocol defines an encoding (e.g. via the
>    charset parameter of the Content-Type header), [get an encoding]()
>    [[ENCODING]]() for the specified value. If that does not return
>    failure, use the return value as the fallback encoding.
>
> 1. Otherwise, check for a <dfn>@charset directive</dfn>.  If the
>    initial sequence of bytes in the byte stream, beginning with the
>    very first byte, matches the hex sequence
>
>         40 63 68 61 72 73 65 74 20 22 LL* 22 3B
>
>    where each `LL` byte must have a value between `23` and `7E`
>    hexadecimal, inclusive, then [get an encoding]() [[ENCODING]]() for
>    the sequence of `LL` bytes, interpreted as ASCII.
>
>    > Note: This byte sequence, when decoded as ASCII, is the string
>    > ‘`@charset "…";`’ where the "…" is the sequence of `LL` bytes
>    > specifying the encoding’s label.
>
>    > Note: UAs may impose an arbitrary limit upon the number of `LL`
>    > bytes scanned, as long as it is large enough to encompass all of
>    > the [labels]() defined in [[ENCODING]](); presently these are all
>    > 19 or fewer bytes long.
>
>    If the [get an encoding]() algorithm returns `utf-16be` or
>    `utf-16le`, use `utf-8` as the fallback encoding.  If it returns
>    anything else except failure, use the return value as the fallback
>    encoding.
>
>    > Note: `utf-16be` and `utf-16le` cannot possibly be correct when
>    > returned by the [get an encoding]() algorithm in this context,
>    > because they are ASCII-incompatible and the [@charset directive]()
>    > is only recognized when encoded compatibly with ASCII.
>    > This mimics the behavior of HTML `<meta>` elements when used to
>    > declare an encoding in-band.
>
> 1. Otherwise, if an [environment encoding]() is provided by the
>    referring document, use that as the fallback encoding.
>
> 1. Otherwise, use `utf-8` as the fallback encoding.
>


-- 
Simon Pieters
Opera Software
Received on Friday, 24 January 2014 08:59:44 UTC