[css-syntax] CR publication, Encodings and @charset from Simon Sapin on 2014-01-07 (www-international@w3.org from January to March 2014)

From: Simon Sapin <simon.sapin@exyr.org>
Date: Tue, 07 Jan 2014 23:31:31 +0000
To: www-style <www-style@w3.org>
CC: 'WWW International' <www-international@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>
Message-ID: <52CC8E53.5070200@exyr.org>
Hi,

Since the CSS WG has already resolved to do so[1] and we have not 
received further LC comments, I will ask to publish CSS Syntax Level 3 
as Candidate Recommendation. (The LC period ended on December 17.)

[1] http://lists.w3.org/Archives/Public/www-style/2013Dec/0403.html


I did however make some non-normative changes to the spec text, based on 
remarks that I happen to have found online:

http://www.w3.org/International/track/products/53


The relevant changes are here:

https://dvcs.w3.org/hg/csswg/rev/e2bac65f3d7b#l3.1
https://dvcs.w3.org/hg/csswg/rev/6c5d1704f506#l2.1
https://dvcs.w3.org/hg/csswg/rev/a067b6c20248#l2.1


Here is a detailed response:


http://www.w3.org/International/track/issues/326
> Reference to Encoding specis missing from the reference section.

This document was already referenced from normative text, but I added it 
to the list of normative references.


http://www.w3.org/International/track/issues/329
> @charset has no effect on stylesheet??

I rephrased the note to clarify that the parse @charset at-rule that 
shows up in CSSOM and the @charset byte sequence that provides a hint 
for the stylesheets’s encoding are not the same thing.

Only the former "has no effect on stylesheets".

http://www.w3.org/International/track/issues/306
>>  where XXX is a sequence of bytes other than 22 (ASCII for ")
> This is unclear and looks odd. [...]

In this rephrasing, I also avoid entirely mentioning the 0x22 ASCII 
character. The details of the byte pattern are not central to this note.


http://www.w3.org/International/track/issues/307
> 1. Step 2 includes instructions for decoding @charset. Later on there
> is a note that says:
>
> "the decode algorithm lets the byte order mark (BOM) take precedence,
> hence the usage of the term "fallback" above."
>
> These are at odds with one another. The first few bytes in the file
> cannot be the ones described in Step 2 if there is a byte order mark
> present.

Indeed, if a BOM is present the first few bytes of a stylesheet can not 
match the @charset byte pattern, and any attempt to use @charset would 
be ignored.

That’s OK since a BOM would take precedence anyway.

> Why isn't BOM handling considered to be "Step 2"?

BOM handling is already described in the Encoding spec’s "decode" 
algorithm, there is no need to duplicate it in CSS Syntax.

> 2. Various places (notably the section on the @charset rule) imply
> that whitespace may precede the @charset, but Step 2 does not allow
> for ASCII whitespace to be disregarded in finding the @charset
> token.

A deviation in whitespace may produce a valid @charset at-rule without 
having the right byte pattern to provide an encoding hint for the 
stylesheet. (This distinction is explained above.)


> 3. The note "Anything ASCII-compatible will do, so using windows-1252
> is fine" is not a clear enough indicator that ONLY ASCII-compatible
> encodings are accepted for style sheets. There should be a direct
> statement about this.

This note is about the decoding of the encoding label name inside the 
@charset byte sequence, not about the decoding of the stylesheet.

I clarified with "since valid labels are all ASCII".

> There is also mention in the section on the @charset rule that the
> byte sequence will "spell out something else entirely" if the
> character encoding isn't ASCII-compatible. Perhaps the text should be
> explicit: the only non-ASCII-compatible encodings that can be used
> for a CSS stylesheet are UTF-16 and its endian friends LE and BE.

I removed that mention, as it was not useful in explaining the 
difference between the @charset at-rule and byte pattern.


http://www.w3.org/International/track/issues/327
> Why refer to the 'fallback' encoding? Why not just say, "determine
> the encoding:"?

In CSS Syntax, because that’s the term that the Encoding spec uses.

In Encoding, what’s provided is a "fallback" because it’s only used when 
no BOM is found.

> I guess this might be a question for the Encoding spec, but it's not
> clear to me why you would go to all the trouble of determining a
> fallback encoding before testing whether there is a byte order mark,
> since if there is you just throw all that work away anyway.

Implementations are free to not bother determining the fallback encoding 
when it’s not gonna be used (i.e. when a BOM is found.)

I removed "First," and "Then," from this part of CSS Syntax to avoid 
implying the contrary.

Cheers,
-- 
Simon Sapin
Received on Tuesday, 7 January 2014 23:32:35 UTC