RE: For review: Character encodings in HTML and CSS

Hi CE, thanks for your comments on my comments,

CE Whitehead, Sun, 14 Feb 2010 17:37:11 -0500:
> Hi, Leif, R. I., all:

>> See http://www.w3.org/International/tutorials/tutorial-char-enc/temp

> 
> From: Leif Halvard Silli 
> Date: Wed, 10 Feb 2010 07:09:09 +0100
> 
>> The document appears thin when it comes to CSS escapes. 
> 
>>  * The explanation of what an CSS escape is, is now located under the 
>> heading "What are entities and NCRs?" 
>> <http://www.w3.org/International/tutorials/tutorial-char-enc/temp#what>. 
>> I think a separate header for CSS escapes would be better. Or, 
>> alternatively, that the existing heading should be changed to say "What 
>> are entities, NCRs and CSS escapes?". 
> Hmm, entities and NCR's are types of escapes, and the information on 
> escapes should all be together.
> I think the reason that CSS is not singled out is because these 
> escapes can be in CSS, HTML, or XML, and R. I. does not mention HTML 
> or XML in the title.  Perhaps the title could be
> "What are entities and NCRs?:  Escapes in HTML, XML, and CSS.:
> But you are right, the CSS escapes seem to be a special case. 

I think that "NCR" is a term that is connected to HTML  and XML. This 
is reflected in R.I.'s text, which says "All NCRs begin with &# and end 
with ;". This is not how a CSS escape begins and ends. However, even a 
CSS escape is a kind of numeric character references - so perhaps it 
can be used about those escapes as well (like you do in your rewording 
of the heading)? I don't know. I'm in doubt. But I am not against it, 
if the editor agrees. 

>>  * There should also be a CSS escape example, the same way that there 
>> already are yellow colored examples of NCR and entities.
>>  * (One of the) CSS examples could e.g. show what it means in practise 
>> that the space character terminates the CSS escape, as this can be 
>> highly confusing for authors. This can best be shown by having a CSS 
>> selectors which contains only escaped letters, or a selector consisting 
>> of 3 letters with the escaped one in the middle:
> 
>> .mål{} 
> 
>> becomes (note the space)
> 
>> .m\0000e5 l{} 
> 
> Thanks; having an example here seems to me a good idea as this is an 
> area where I am still unsure (whereas I've used entites and NCR's in 
> HTML a ton).

I think you are not alone.

>> (3) Specification of the encoding of an external CSS file: The text 
>> currently says that 
> 
>>    ]]If your external CSS style sheet contains any non-ASCII text [ 
>> snip ] you should use the @charset rule as the first thing on the page. 
>> (It should not be used for CSS embedded in a document.)"[[
> 
>>    However, I think many authors are not aware that they may use HTTP 
>> to signal the charset of CSS files as well. Therefore I think you 
>> should mention this. (You already mentioned another alternative in that 
>> context, namely to use the BOM. BOM has issues of support you say, but 
>> HTTP work very well, AFAIK.)
> 
> I think R. Ishida did mention this--albeit briefly!  See:
> http://www.w3.org/International/tutorials/tutorial-char-enc/temp#atcharset


(You meant: 
<http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0400> 
)
 
> "For external, linked CSS style sheets the precedence rules are:
> 
> HTTP Content-Type 
> @charset rule 
> link charset attribute "

Indeed. But unless this is explained, then it kind of hangs in the air.

Speaking about using HTTP:  Under the heading "What is a HTTP header" 
(<http://www.w3.org/International/tutorials/tutorial-char-enc/temp#httpheadwhat>), 
it is adviced to configure Apache to server all HTML pages encoded as 
UTF8:

 "AddType 'text/html; charset=UTF-8' html"

However, it might also be a good thing to mention that Apache allows us 
to override [ AKA specify file by file] the encoding very simply by 
adding charset suffixes, as I explained here:

http://lists.w3.org/Archives/Public/www-amaya/2010JanMar/0083.html


Authors will, btw, also often find that this works "out of the box". At 
least it did on my computer.

> R. I. hardly discusses the BOM for CSS either; he discusses the 
> @charset rule
> and how it might interact with the BOM (so that if you had a BOM you 
> might not want an @charset declaration) . . . where does he really, 
> in-depth, discuss the BOM as a way of declaring the character 
> encoding for CSS pages?

This sounds like a good thing to mention. But does the CSS 
specifications say that CSS interpreters may in fact use the BOM for 
detecting the encoding?

>> (4) The logics of using escapes in @style and <style> and stylesheets:
>>   * I believe many web authors think they /have/ to use escapes e.g. in 
>> CSS selectors. So I think that the document should say that they don't 
>> have to - they can often type them directly - especially if CSS and 
>> HTML are located in the same document ...
> 
> Hmm, it might not be explicit enogh; it's certainly implicit; for R. 
> I. says:
> 
> "It is a good idea to always declare the encoding of external CSS 
> style sheets if you have any non-ASCII text in your CSS file."

Yes. But if a message is important, then it is often good to say it 
directly. Instead of relying on the reader to put two and two together.

>> (5) I believe that many authors are not aware that they may use 
>> character escapes inside (many) HTML attributes. Hence I think a word 
>> should be said about that the thing that this is in fact possible. (You 
>> talk about the style attribute, but @style is - or may appear - as a 
>> special case.
> Perhaps.  But there is some mention of this too--it's pretty implicit 
> in the section:
> http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0470


On one side I think that many may consider the style attribute as a 
special case. On the other side, I don't understand why @style is 
discussed in particular. (When _I_ read that info, the thing that was 
new to me was that I could use HTML entities inside the style attribute 
- where as the point made is that it is better to use CSS escapes.)

In HTML4 then there are some attributes which permits entities, and 
other that don't. E.g. the @id attribute don't. (This is changing in 
HTML5.) I am probably not the only one that have discovered that NCRs 
do not validate inside the ID attribute. And so I don't think I am the 
only one to have been confused about whether I can use character 
escapes inside attributes.

Of course, like many of the other things I have suggested, this 
suggestion is not without roots in my own experience/confusion. Which 
might not be universal. ;-)

> Do you think additional info about thie use of escapes inside 
> attributes should be listed in "When to use escapes?"

Why not. 
-- 
leif halvard silli

Received on Monday, 15 February 2010 13:20:38 UTC