Re: [RC5] character-encoding-038 invalid

On 01/16/2011 06:11 AM, fantasai wrote:
> Hixie, got a question, see below?
>
> On 01/14/2011 03:22 PM, Peter Linss wrote:
>>
>> On Jan 14, 2011, at 12:58 PM, Alan Gresley wrote:
>>
>>> On 15/01/2011 7:24 AM, Peter Linss wrote:
>>>> I believe there are a few problems with this test.
>>>>
>>>> First, the only style possibilities for the test paragraphs are white
>>>> text on a green background versus white text on a green background. I
>>>> presume it's trying to test for the application of the rule in the
>>>> linked stylesheet but there would be no visible effect either way.
>>>>
>>>> Second, I'm trying to figure out if this test requires http or not (and
>>>> exactly what for that matter this test is trying to test), I'm guessing
>>>> the rule in the linked style sheet is NOT supposed to match
>>>> anything? It
>>>> it relying on the linked stylesheet being served as utf-8? (the
>>>> server's
>>>> default, as there is no explicit encoding set on that file) Why does
>>>> the
>>>> title state "malformed UTF-8"? Either something's missing here or I'm
>>>> not getting it...
>>>
>>>
>>> The external stylesheet CSS [1] has this
>>>
>>> .t�st { color: white; background: green; }
>>>
>>> I presume that � is malformed CSS. Each class of each <p> has a string
>>> of class"t(Unicode)st". These are the Unicode characters.
>>>
>>> é ้ щ ى ι י И
>>>
>>>
>>> 1.
>>> <http://test.csswg.org/suites/css2.1/20101210/html4/support/character-encoding-038.css>
>>>
>>>
>>
>> The external stylesheet is:
>> .tést { color: white; background: green; }
>>
>> if interpreted as ISO-8859-1 encoding. Which I take to mean that the
>> stylesheet needs to be served via http with the explicit encoding of
>> utf-8, so that it does NOT match any of the content. (Meaning the
>> stylesheet is malformed utf-8, which explains the title.)
>>
>> So I presume the stylesheet should be updated to be:
>> .t�st { color: yellow; background: red; }
>>
>> and the test does in fact need the 'http' flag (which I already added).
>
> Hixie's server does not serve the original CSS up with a charset parameter,
> although it does set UTF-8 for the HTML file, so I don't think we should
> be serving a UTF-8 header for this. (According to CSS2.1, the style sheet
> must be treated as UTF-8 even without the header: see [1].)
>
> I do agree that the style sheet needs to be setting a red background,
> though, because right now there appears no way for it to fail.
>
> What I don't understand is what to do with the rest of the <p
> class="t*st">s,
> since there doesn't seem to be anything that could potentially trigger a
> failure on those, either.

1. é is 'LATIN SMALL LETTER E WITH ACUTE' (U+00E9),
    which is encoded in windows-1252, -54, -56, -57, -58 as 0xE9.
2. ้ is 'THAI CHARACTER MAI THO' (U+0E49),
    which is encoded in windows-874 as 0xE9.
3. щ is 'CYRILLIC SMALL LETTER SHCHA' (U+0449),
    which is encoded in iso-8859-5 as 0xE9.
4. ى is 'ARABIC LETTER ALEF MAKSURA' (U+0649),
    which is encoded in iso-8859-6 as 0xE9.
5. ι is 'GREEK SMALL LETTER IOTA' (U+03B9),
    which is encoded in windows-1253 as 0xE9.
6. י is 'HEBREW LETTER YOD' (U+05D9),
    which is encoded in windows-1255 as 0xE9.
7. И is 'CYRILLIC CAPITAL LETTER I' (U+0418),
    which is encoded in koi8-r as 0xE9.

Most of these encodings are mentioned in the table of defaults at the 
end of the "Determining the character encoding" section in HTML [1].

It seems unlikely that these would fail in en-US browsers, but much less 
so in foreign locales (even though we don't usually test those).

HTH
Ms2ger

[1] <http://www.whatwg.org/html/#determining-the-character-encoding>

Received on Sunday, 16 January 2011 11:39:41 UTC