W3C home > Mailing lists > Public > public-css-testsuite@w3.org > January 2011

Re: [RC5] character-encoding-038 invalid

From: Alan Gresley <alan@css-class.com>
Date: Mon, 17 Jan 2011 00:25:07 +1100
Message-ID: <4D32F1B3.60703@css-class.com>
To: Ms2ger <ms2ger@gmail.com>
CC: fantasai <fantasai.lists@inkedblade.net>, Peter Linss <peter.linss@hp.com>, Public CSS test suite mailing list <public-css-testsuite@w3.org>, Ian Hickson <ian@hixie.ch>
On 16/01/2011 10:38 PM, Ms2ger wrote:
> On 01/16/2011 06:11 AM, fantasai wrote:
>> Hixie, got a question, see below?
>>
>> On 01/14/2011 03:22 PM, Peter Linss wrote:
>>>
>>> On Jan 14, 2011, at 12:58 PM, Alan Gresley wrote:
>>>
>>>> On 15/01/2011 7:24 AM, Peter Linss wrote:
>>>>> I believe there are a few problems with this test.
>>>>>
>>>>> First, the only style possibilities for the test paragraphs are white
>>>>> text on a green background versus white text on a green background. I
>>>>> presume it's trying to test for the application of the rule in the
>>>>> linked stylesheet but there would be no visible effect either way.
>>>>>
>>>>> Second, I'm trying to figure out if this test requires http or not
>>>>> (and
>>>>> exactly what for that matter this test is trying to test), I'm
>>>>> guessing
>>>>> the rule in the linked style sheet is NOT supposed to match
>>>>> anything? It
>>>>> it relying on the linked stylesheet being served as utf-8? (the
>>>>> server's
>>>>> default, as there is no explicit encoding set on that file) Why does
>>>>> the
>>>>> title state "malformed UTF-8"? Either something's missing here or I'm
>>>>> not getting it...
>>>>
>>>>
>>>> The external stylesheet CSS [1] has this
>>>>
>>>> .t�st { color: white; background: green; }
>>>>
>>>> I presume that � is malformed CSS. Each class of each <p> has a string
>>>> of class"t(Unicode)st". These are the Unicode characters.
>>>>
>>>> é ้ щ ى ι י И
>>>>
>>>>
>>>> 1.
>>>> <http://test.csswg.org/suites/css2.1/20101210/html4/support/character-encoding-038.css>
>>>>
>>>>
>>>>
>>>
>>> The external stylesheet is:
>>> .tést { color: white; background: green; }
>>>
>>> if interpreted as ISO-8859-1 encoding. Which I take to mean that the
>>> stylesheet needs to be served via http with the explicit encoding of
>>> utf-8, so that it does NOT match any of the content. (Meaning the
>>> stylesheet is malformed utf-8, which explains the title.)
>>>
>>> So I presume the stylesheet should be updated to be:
>>> .t�st { color: yellow; background: red; }
>>>
>>> and the test does in fact need the 'http' flag (which I already added).
>>
>> Hixie's server does not serve the original CSS up with a charset
>> parameter,
>> although it does set UTF-8 for the HTML file, so I don't think we should
>> be serving a UTF-8 header for this. (According to CSS2.1, the style sheet
>> must be treated as UTF-8 even without the header: see [1].)
>>
>> I do agree that the style sheet needs to be setting a red background,
>> though, because right now there appears no way for it to fail.
>>
>> What I don't understand is what to do with the rest of the <p
>> class="t*st">s,
>> since there doesn't seem to be anything that could potentially trigger a
>> failure on those, either.
>
> 1. é is 'LATIN SMALL LETTER E WITH ACUTE' (U+00E9),
> which is encoded in windows-1252, -54, -56, -57, -58 as 0xE9.
> 2. ้ is 'THAI CHARACTER MAI THO' (U+0E49),
> which is encoded in windows-874 as 0xE9.
> 3. щ is 'CYRILLIC SMALL LETTER SHCHA' (U+0449),
> which is encoded in iso-8859-5 as 0xE9.
> 4. ى is 'ARABIC LETTER ALEF MAKSURA' (U+0649),
> which is encoded in iso-8859-6 as 0xE9.
> 5. ι is 'GREEK SMALL LETTER IOTA' (U+03B9),
> which is encoded in windows-1253 as 0xE9.
> 6. י is 'HEBREW LETTER YOD' (U+05D9),
> which is encoded in windows-1255 as 0xE9.
> 7. И is 'CYRILLIC CAPITAL LETTER I' (U+0418),
> which is encoded in koi8-r as 0xE9.
>
> Most of these encodings are mentioned in the table of defaults at the
> end of the "Determining the character encoding" section in HTML [1].
>
> It seems unlikely that these would fail in en-US browsers, but much less
> so in foreign locales (even though we don't usually test those).
>
> HTH
> Ms2ger
>
> [1] <http://www.whatwg.org/html/#determining-the-character-encoding>


Well this is all very interesting. Out of coincidence, I logged into an 
email account that I had forgotten about and one of the emails had 
"you�re" (? within black diamond). I presume the word is "you're."


To clarify something which I noticed from Peter's reply to me if it was 
not noticed.

I wrote this in reply to Peter's initial message.


>>>> The external stylesheet CSS [1] has this
>>>>
>>>> .t�st { color: white; background: green; }

       _has ? within black diamond_


Peter replied and said this.

>>> The external stylesheet is:
>>> .tést { color: white; background: green; }

       _has Latin small letter e with acute_

and then this.

>>> So I presume the stylesheet should be updated to be:
>>> .t�st { color: yellow; background: red; }

       _has ? within black diamond_


So are Peter I seeing different characters (Peter sees 'e' with acute 
and I'm seeing '?' within black diamond)?


In the external stylesheet on the original test on Hixie's server I see 
this.

      .tést { color: white; background: green; }

       _has Latin small letter e with acute_


So if the é (letter e with acute) is (U+00E9), what is the Unicode for � 
(? within black diamond)?


-- 
Alan http://css-class.com/

Armies Cannot Stop An Idea Whose Time Has Come. - Victor Hugo
Received on Sunday, 16 January 2011 13:25:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 16 January 2011 13:25:53 GMT