W3C home > Mailing lists > Public > www-international@w3.org > January to March 2010

RE: For review: Character encodings in HTML and CSS

From: CE Whitehead <cewcathar@hotmail.com>
Date: Mon, 15 Feb 2010 17:25:08 -0500
Message-ID: <BLU109-W2869B34ACD9C4FBEE9D07EB34A0@phx.gbl>
To: <xn--mlform-iua@xn--mlform-iua.no>
CC: <ishida@w3.org>, <www-international@w3.org>


Hi; one or two other notes--I've re-thought my comments about ie. and eg. and also now agree with Leif about css escapes--they need to be part of the sub-heading.


First, Leif, thanks for your really nice reply!

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 
Date: Wed, 10 Feb 2010 07:09:09 +0100

> (5) I believe that many authors are not aware that they may use 
> character escapes inside (many) HTML attributes. Hence I think a word 
> should be said about that the thing that this is in fact possible. (You 
> talk about the style attribute, but @style is - or may appear - as a 
> special case.

Personally, I'd never use either an HTML entity or an escape inside an attribute or in fact for any text that was going to display only as source code because I don't see entities/escapes as all that readable.
But I think that these can be discussed more in this document if this is something that will be helpful.


From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 
Date: Mon, 15 Feb 2010 14:20:02 +0100

> I think that "NCR" is a term that is connected to HTML  and XML. This 
> is reflected in R.I.'s text, which says "All NCRs begin with &# and end 
> with ;".
This sounds correct.  So definitely some rewording of the sub-heading here would be helpful--
"What are entities, NCRs, and CSS escapes?" as you originally suggested would be fine with me also.
(I'm no longer particularly tied to my edit/rewrite--though maybe it could be,
"What are entities , NCRs and other escapes-- in HTML, XML, and CSS"
I like your rewrite fine.)
We'll leave it up to the editor.
> . . .
> However, it might also be a good thing to mention that Apache allows us 
> to override [ AKA specify file by file] the encoding very simply by 
> adding charset suffixes, as I explained here:

> http://lists.w3.org/Archives/Public/www-amaya/2010JanMar/0083.html
Many thanks for this;
I have saved this link for my reference.
And now I have a question for you:  do I name the utf-8 version of my file (for example)

index.html.utf8.html

or index.html.utf-8.html
?

Thanks again

(and if anyone can help me better understand when a server is going to take my html pages with an html 4.01 document type declaration at the top and serve these as unicode/utf-8,
I would be interested in knowing this too).

Re: For review: Character encodings in HTML and CSS
From: John Cowan <cowan@ccil.org> 
Date: Wed, 10 Feb 2010 21:01:23 -0500

> Richard Ishida scripsit:


>> > For "ie." read "i.e.", and for "eg." read "e.g." throughout.
>> 
>> ie. and eg. are my preferred style.  It's enough that I have to use American
>> spelling ;-)

>I well believe that that's irritating, but "i.e." and "e.g." are standard
> abbreviations in both AmE and BrE.
I am not used to ie. and eg. except in R. Ishida's style;
but I am all for people's having their preferred style;
so long as he does not go to 'ie' without the full stop
. . . 
On thinking it over, I'm sorry I even tried to correct this . . .

(I'm a big fan of personal style, personal idiolect, personal language, etc., so it's not my place to correct this)

So maybe John would say, suit yourself here.

Best,

C. E. Whitehead
cewcathar@hotmail.com
 
>-- 
> . . .                           John Cowan
 
> Date: Mon, 15 Feb 2010 14:20:02 +0100
> From: xn--mlform-iua@xn--mlform-iua.no
> To: cewcathar@hotmail.com
> CC: ishida@w3.org; www-international@w3.org
> Subject: RE: For review: Character encodings in HTML and CSS
> 
> Hi CE, thanks for your comments on my comments,
> 
> CE Whitehead, Sun, 14 Feb 2010 17:37:11 -0500:
> > Hi, Leif, R. I., all:
> 
> >> See http://www.w3.org/International/tutorials/tutorial-char-enc/temp
> > 
> > From: Leif Halvard Silli 
> > Date: Wed, 10 Feb 2010 07:09:09 +0100
> > 
> >> The document appears thin when it comes to CSS escapes. 
> > 
> >> * The explanation of what an CSS escape is, is now located under the 
> >> heading "What are entities and NCRs?" 
> >> <http://www.w3.org/International/tutorials/tutorial-char-enc/temp#what>. 
> >> I think a separate header for CSS escapes would be better. Or, 
> >> alternatively, that the existing heading should be changed to say "What 
> >> are entities, NCRs and CSS escapes?". 
> > Hmm, entities and NCR's are types of escapes, and the information on 
> > escapes should all be together.
> > I think the reason that CSS is not singled out is because these 
> > escapes can be in CSS, HTML, or XML, and R. I. does not mention HTML 
> > or XML in the title. Perhaps the title could be
> > "What are entities and NCRs?: Escapes in HTML, XML, and CSS.:
> > But you are right, the CSS escapes seem to be a special case. 
> 
> I think that "NCR" is a term that is connected to HTML and XML. This 
> is reflected in R.I.'s text, which says "All NCRs begin with &# and end 
> with ;". This is not how a CSS escape begins and ends. However, even a 
> CSS escape is a kind of numeric character references - so perhaps it 
> can be used about those escapes as well (like you do in your rewording 
> of the heading)? I don't know. I'm in doubt. But I am not against it, 
> if the editor agrees. 
> 
> >> * There should also be a CSS escape example, the same way that there 
> >> already are yellow colored examples of NCR and entities.
> >> * (One of the) CSS examples could e.g. show what it means in practise 
> >> that the space character terminates the CSS escape, as this can be 
> >> highly confusing for authors. This can best be shown by having a CSS 
> >> selectors which contains only escaped letters, or a selector consisting 
> >> of 3 letters with the escaped one in the middle:
> > 
> >> .mål{} 
> > 
> >> becomes (note the space)
> > 
> >> .m\0000e5 l{} 
> > 
> > Thanks; having an example here seems to me a good idea as this is an 
> > area where I am still unsure (whereas I've used entites and NCR's in 
> > HTML a ton).
> 
> I think you are not alone.
> 
> >> (3) Specification of the encoding of an external CSS file: The text 
> >> currently says that 
> > 
> >> ]]If your external CSS style sheet contains any non-ASCII text [ 
> >> snip ] you should use the @charset rule as the first thing on the page. 
> >> (It should not be used for CSS embedded in a document.)"[[
> > 
> >> However, I think many authors are not aware that they may use HTTP 
> >> to signal the charset of CSS files as well. Therefore I think you 
> >> should mention this. (You already mentioned another alternative in that 
> >> context, namely to use the BOM. BOM has issues of support you say, but 
> >> HTTP work very well, AFAIK.)
> > 
> > I think R. Ishida did mention this--albeit briefly! See:
> > http://www.w3.org/International/tutorials/tutorial-char-enc/temp#atcharset
> 
> (You meant: 
> <http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0400> 
> )
> 
> > "For external, linked CSS style sheets the precedence rules are:
> > 
> > HTTP Content-Type 
> > @charset rule 
> > link charset attribute "
> 
> Indeed. But unless this is explained, then it kind of hangs in the air.
> 
> Speaking about using HTTP: Under the heading "What is a HTTP header" 
> (<http://www.w3.org/International/tutorials/tutorial-char-enc/temp#httpheadwhat>), 
> it is adviced to configure Apache to server all HTML pages encoded as 
> UTF8:
> 
> "AddType 'text/html; charset=UTF-8' html"
> 
> However, it might also be a good thing to mention that Apache allows us 
> to override [ AKA specify file by file] the encoding very simply by 
> adding charset suffixes, as I explained here:
> 
> http://lists.w3.org/Archives/Public/www-amaya/2010JanMar/0083.html
> 
> Authors will, btw, also often find that this works "out of the box". At 
> least it did on my computer.
> 
> > R. I. hardly discusses the BOM for CSS either; he discusses the 
> > @charset rule
> > and how it might interact with the BOM (so that if you had a BOM you 
> > might not want an @charset declaration) . . . where does he really, 
> > in-depth, discuss the BOM as a way of declaring the character 
> > encoding for CSS pages?
> 
> This sounds like a good thing to mention. But does the CSS 
> specifications say that CSS interpreters may in fact use the BOM for 
> detecting the encoding?
> 
> >> (4) The logics of using escapes in @style and <style> and stylesheets:
> >> * I believe many web authors think they /have/ to use escapes e.g. in 
> >> CSS selectors. So I think that the document should say that they don't 
> >> have to - they can often type them directly - especially if CSS and 
> >> HTML are located in the same document ...
> > 
> > Hmm, it might not be explicit enogh; it's certainly implicit; for R. 
> > I. says:
> > 
> > "It is a good idea to always declare the encoding of external CSS 
> > style sheets if you have any non-ASCII text in your CSS file."
> 
> Yes. But if a message is important, then it is often good to say it 
> directly. Instead of relying on the reader to put two and two together.
> 
> >> (5) I believe that many authors are not aware that they may use 
> >> character escapes inside (many) HTML attributes. Hence I think a word 
> >> should be said about that the thing that this is in fact possible. (You 
> >> talk about the style attribute, but @style is - or may appear - as a 
> >> special case.
> > Perhaps. But there is some mention of this too--it's pretty implicit 
> > in the section:
> > http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0470
> 
> On one side I think that many may consider the style attribute as a 
> special case. On the other side, I don't understand why @style is 
> discussed in particular. (When _I_ read that info, the thing that was 
> new to me was that I could use HTML entities inside the style attribute 
> - where as the point made is that it is better to use CSS escapes.)
> 
> In HTML4 then there are some attributes which permits entities, and 
> other that don't. E.g. the @id attribute don't. (This is changing in 
> HTML5.) I am probably not the only one that have discovered that NCRs 
> do not validate inside the ID attribute. And so I don't think I am the 
> only one to have been confused about whether I can use character 
> escapes inside attributes.
> 
> Of course, like many of the other things I have suggested, this 
> suggestion is not without roots in my own experience/confusion. Which 
> might not be universal. ;-)
> 
> > Do you think additional info about thie use of escapes inside 
> > attributes should be listed in "When to use escapes?"
> 
> Why not. 
> -- 
> leif halvard silli
 		 	   		  
Received on Monday, 15 February 2010 22:25:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 15 February 2010 22:25:43 GMT