Re: [RC5] character-encoding-038 invalid


On Jan 16, 2011, at 3:38 AM, Ms2ger wrote:

On 01/16/2011 06:11 AM, fantasai wrote:
Hixie, got a question, see below?

{snip}

Hixie's server does not serve the original CSS up with a charset parameter,
although it does set UTF-8 for the HTML file, so I don't think we should
be serving a UTF-8 header for this. (According to CSS2.1, the style sheet
must be treated as UTF-8 even without the header: see [1].)

I do agree that the style sheet needs to be setting a red background,
though, because right now there appears no way for it to fail.

What I don't understand is what to do with the rest of the <p
class="t*st">s,
since there doesn't seem to be anything that could potentially trigger a
failure on those, either.

1. é is 'LATIN SMALL LETTER E WITH ACUTE' (U+00E9),
   which is encoded in windows-1252, -54, -56, -57, -58 as 0xE9.
2. ้ is 'THAI CHARACTER MAI THO' (U+0E49),
   which is encoded in windows-874 as 0xE9.
3. щ is 'CYRILLIC SMALL LETTER SHCHA' (U+0449),
   which is encoded in iso-8859-5 as 0xE9.
4. ى is 'ARABIC LETTER ALEF MAKSURA' (U+0649),
   which is encoded in iso-8859-6 as 0xE9.
5. ι is 'GREEK SMALL LETTER IOTA' (U+03B9),
   which is encoded in windows-1253 as 0xE9.
6. י is 'HEBREW LETTER YOD' (U+05D9),
   which is encoded in windows-1255 as 0xE9.
7. И is 'CYRILLIC CAPITAL LETTER I' (U+0418),
   which is encoded in koi8-r as 0xE9.

Most of these encodings are mentioned in the table of defaults at the
end of the "Determining the character encoding" section in HTML [1].

It seems unlikely that these would fail in en-US browsers, but much less
so in foreign locales (even though we don't usually test those).

That helps a lot, thanks.

So the test is trying to make sure that the stylesheet gets interpreted as UTF-8 and not any of the above listed encodings (even though the stylesheet does not actually contain valid UTF-8 data).

I believe this answers one of my original questions in that the stylesheet should NOT have an explicit character encoding sent from the server and we're testing if the browser defaults to UTF-8 or not (or does some encoding sniffing). That also means that the does does not need the http flag.

(I suppose another possible interpretation of the test is that the stylesheet should have an explicit UTF-8 encoding sent from the server and we're testing if the browser see the malformed UTF-8 and falls back to another encoding... Hixie?)

Received on Sunday, 16 January 2011 20:27:38 UTC