- From: Simon Pieters <simonp@opera.com>
- Date: Fri, 24 Jan 2014 09:58:53 +0100
- To: "www-style list" <www-style@w3.org>, "www International" <www-international@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, "Zack Weinberg" <zackw@panix.com>
On Thu, 23 Jan 2014 21:35:28 +0100, Zack Weinberg <zackw@panix.com> wrote: > #### Summary of how style sheet encoding is determined > >> This section is non-normative. > > [UTF-8]() is the default character encoding for CSS. I think this is a confusing statement. It sounds like if you don't specify an encoding, you get utf-8. > The use of > [UTF-8]() for new style sheets is mandated by [[ENCODING]](). When > legacy requirements dictate the use of some other encoding, either for > the style sheet or some or all of its referring documents, authors may > set the encoding as follows: The list applies for utf-8 also. Also 'may' is inappropriate for a non-normative section. > * The network protocol (e.g. HTTP) may supply an encoding for the > character sheet as metadata; when available, use of this mechanism > is preferred. New content encoded in [UTF-8]() should be marked as > such using this mechanism. Why is this the preferred mechanism for utf-8 (but not for other encodings?)? > * ASCII-compatible encodings may also be declared in-band by use of > an [@charset directive](). This directive is ignored if the > network protocol supplies an encoding as metadata. > > > Warning: Although an [@charset directive]() textually resembles > > an [at-rule](), it is not parsed as an at-rule; only a specific > > byte sequence, beginning with the very first byte in the style > > sheet, is accepted. > > * The referring document provides, explicitly or implicitly, an > [environment encoding]() which is assumed to apply to the style > sheet if neither of the above mechanisms provide an encoding. > Relying on the environment encoding is discouraged. Why is it discouraged? > * [UTF-16]() encoding, which is not ASCII-compatible, may be declared > out-of-band with network data or in-band with a [byte order mark](), > but not with a [@charset directive](). The use of [UTF-16]() is > **strongly discouraged**. Why is it more strongly discouraged than other non-utf-8 encodings? Since utf-8 is already must, I think it doesn't make sense to discourage other specific encodings. > When present, a [byte order mark]() overrides any encoding set by > network metadata, as specified in [[ENCODING]](). > > * ASCII-incompatible encodings other than [UTF-16]() may not be > used, as specified in [[ENCODING]](). > > #### Algorithm for determining the fallback encoding > > The [decode]() algorithm takes as input a <dfn>fallback > encoding</dfn>, which UAs shall determine as follows: > >> Note: The [decode]() algorithm uses the [fallback encoding]() only >> when no [byte order mark]() is present in the input. > > 1. If HTTP or equivalent protocol defines an encoding (e.g. via the > charset parameter of the Content-Type header), [get an encoding]() > [[ENCODING]]() for the specified value. If that does not return > failure, use the return value as the fallback encoding. > > 1. Otherwise, check for a <dfn>@charset directive</dfn>. If the > initial sequence of bytes in the byte stream, beginning with the > very first byte, matches the hex sequence > > 40 63 68 61 72 73 65 74 20 22 LL* 22 3B > > where each `LL` byte must have a value between `23` and `7E` > hexadecimal, inclusive, then [get an encoding]() [[ENCODING]]() for > the sequence of `LL` bytes, interpreted as ASCII. > > > Note: This byte sequence, when decoded as ASCII, is the string > > ‘`@charset "…";`’ where the "…" is the sequence of `LL` bytes > > specifying the encoding’s label. > > > Note: UAs may impose an arbitrary limit upon the number of `LL` > > bytes scanned, as long as it is large enough to encompass all of > > the [labels]() defined in [[ENCODING]](); presently these are all > > 19 or fewer bytes long. > > If the [get an encoding]() algorithm returns `utf-16be` or > `utf-16le`, use `utf-8` as the fallback encoding. If it returns > anything else except failure, use the return value as the fallback > encoding. > > > Note: `utf-16be` and `utf-16le` cannot possibly be correct when > > returned by the [get an encoding]() algorithm in this context, > > because they are ASCII-incompatible and the [@charset directive]() > > is only recognized when encoded compatibly with ASCII. > > This mimics the behavior of HTML `<meta>` elements when used to > > declare an encoding in-band. > > 1. Otherwise, if an [environment encoding]() is provided by the > referring document, use that as the fallback encoding. > > 1. Otherwise, use `utf-8` as the fallback encoding. > -- Simon Pieters Opera Software
Received on Friday, 24 January 2014 08:59:45 UTC