W3C home > Mailing lists > Public > www-international@w3.org > January to March 2014

[css-syntax] CR publication, Encodings and @charset

From: Simon Sapin <simon.sapin@exyr.org>
Date: Tue, 07 Jan 2014 23:31:31 +0000
Message-ID: <52CC8E53.5070200@exyr.org>
To: www-style <www-style@w3.org>
CC: 'WWW International' <www-international@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>

Since the CSS WG has already resolved to do so[1] and we have not 
received further LC comments, I will ask to publish CSS Syntax Level 3 
as Candidate Recommendation. (The LC period ended on December 17.)

[1] http://lists.w3.org/Archives/Public/www-style/2013Dec/0403.html

I did however make some non-normative changes to the spec text, based on 
remarks that I happen to have found online:


The relevant changes are here:


Here is a detailed response:

> Reference to Encoding specis missing from the reference section.

This document was already referenced from normative text, but I added it 
to the list of normative references.

> @charset has no effect on stylesheet??

I rephrased the note to clarify that the parse @charset at-rule that 
shows up in CSSOM and the @charset byte sequence that provides a hint 
for the stylesheetsís encoding are not the same thing.

Only the former "has no effect on stylesheets".

>>  where XXX is a sequence of bytes other than 22 (ASCII for ")
> This is unclear and looks odd. [...]

In this rephrasing, I also avoid entirely mentioning the 0x22 ASCII 
character. The details of the byte pattern are not central to this note.

> 1. Step 2 includes instructions for decoding @charset. Later on there
> is a note that says:
> "the decode algorithm lets the byte order mark (BOM) take precedence,
> hence the usage of the term "fallback" above."
> These are at odds with one another. The first few bytes in the file
> cannot be the ones described in Step 2 if there is a byte order mark
> present.

Indeed, if a BOM is present the first few bytes of a stylesheet can not 
match the @charset byte pattern, and any attempt to use @charset would 
be ignored.

Thatís OK since a BOM would take precedence anyway.

> Why isn't BOM handling considered to be "Step 2"?

BOM handling is already described in the Encoding specís "decode" 
algorithm, there is no need to duplicate it in CSS Syntax.

> 2. Various places (notably the section on the @charset rule) imply
> that whitespace may precede the @charset, but Step 2 does not allow
> for ASCII whitespace to be disregarded in finding the @charset
> token.

A deviation in whitespace may produce a valid @charset at-rule without 
having the right byte pattern to provide an encoding hint for the 
stylesheet. (This distinction is explained above.)

> 3. The note "Anything ASCII-compatible will do, so using windows-1252
> is fine" is not a clear enough indicator that ONLY ASCII-compatible
> encodings are accepted for style sheets. There should be a direct
> statement about this.

This note is about the decoding of the encoding label name inside the 
@charset byte sequence, not about the decoding of the stylesheet.

I clarified with "since valid labels are all ASCII".

> There is also mention in the section on the @charset rule that the
> byte sequence will "spell out something else entirely" if the
> character encoding isn't ASCII-compatible. Perhaps the text should be
> explicit: the only non-ASCII-compatible encodings that can be used
> for a CSS stylesheet are UTF-16 and its endian friends LE and BE.

I removed that mention, as it was not useful in explaining the 
difference between the @charset at-rule and byte pattern.

> Why refer to the 'fallback' encoding? Why not just say, "determine
> the encoding:"?

In CSS Syntax, because thatís the term that the Encoding spec uses.

In Encoding, whatís provided is a "fallback" because itís only used when 
no BOM is found.

> I guess this might be a question for the Encoding spec, but it's not
> clear to me why you would go to all the trouble of determining a
> fallback encoding before testing whether there is a byte order mark,
> since if there is you just throw all that work away anyway.

Implementations are free to not bother determining the fallback encoding 
when itís not gonna be used (i.e. when a BOM is found.)

I removed "First," and "Then," from this part of CSS Syntax to avoid 
implying the contrary.

Simon Sapin
Received on Tuesday, 7 January 2014 23:32:35 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:04 UTC