W3C home > Mailing lists > Public > www-international@w3.org > January to March 2010

Re: For review: Character encodings in HTML and CSS

From: Najib Tounsi <ntounsi@gmail.com>
Date: Fri, 19 Feb 2010 23:45:05 +0000
Message-ID: <4B7F2281.1090000@emi.ac.ma>
To: Richard Ishida <ishida@w3.org>
CC: www-international@w3.org
Hi Richard,

Please find below some feedbacks about " Character encodings in HTML and 
CSS"

1- § Character sets, coded character sets, and encodings
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0060)
"A coded character set is a set of characters for which a unique number 
has been assigned to each character. Units of a coded character set are 
known as code points."
I suggest to add that the value assigned to each character corresponds 
to its position in the coded character set. Indeed, later on you mention 
this position when you talk about encoding:
"the encoding is a straightforward mapping to the scalar position of the 
characters in the coded character set." for (ISO 8859-1).
...
"the first line of numbers represents the position of a character in the 
Unicode coded character set" for Unicode.
However, in a set there is no particular order in general.

2- Typo
s/A character escape is an  way of representing/A character escape is a  
way of representing/

3- § Applying an encoding to your content
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#applyingencoding)
"As a content author you need to check that your editor or scripts are 
saving text in the encoding of your choice."
I suggest
"As a content author you need to check that your editor or scripts are 
saving text in the encoding YOU EXPECT OR LET YOU SELECT THE ONE of your 
choice."

4- § CSS
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#csssummary)
- May be s/non-ASCII/non-US-ASCII/.
- "you should use the @charset rule as the first thing on the page."
May be say "you should use the @charset rule as the first thing on the 
page, SET TO THE SAME ENCODING AS THE CORRESPO?DING HTML PAGE."
(BTW, it's worth to test what happens when the two encodings are 
declared not the same. Does all browsers agree?)

5- § What is the HTTP header?
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#httpheadwhat)
Not very important, but in the script example:
"Date: Wed, 05 Nov 2003 10:46:04 GMT"
use a more recent date?

6- § MIME types and DOCTYPE switching
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0150)
- 5th <p>
"Unfortunately, Internet Explorer currently doesn't support files served 
as XML"
is the word "currently" still accurate?
- 9th <p>
"The orange MIME-type labels are not recommended."
"The orange MIME-type labels (the two at the bottom) are not recommended."
because when reading a non colored printed version :-)

7- § Pros and cons of using the HTTP header for encoding declarations /
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0130)
- Advanages (end of 1st item)
"it doesn't matter that transcoders typically do not change the internal 
encoding declarations, just the document encoding."
"it doesn't matter that transcoders typically do not change the internal 
encoding declarations, just the document encoding (AND THE HTTP 
INFORMATION WHICH GOES WITH IT)."
- Disadvantages (3rd item)
"There are potential problems for both static and dynamic documents if 
they are to be saved to a location such as a CD or hard disk."
"There are potential problems for both static and dynamic documents if 
they are NOT READ ON A SERVER (e.g. THEY WERE saved to a location such 
as a CD or hard disk)."
- So should I use this method? (next <p>)
"the file may be changed by an intermediary before it reaches the user 
[...], you may particularly want to consider using the HTTP declaration."
May be
"the file may be changed by an intermediary before it reaches the user 
[...], you may particularly want to consider using the HTTP declaration, 
SINCE IT IS CHANGED ACCORDINGLY."
- your following remark:
"(Some people would argue that it is rarely appropriate to declare the 
encoding in the HTTP header if you are going to repeat it in the content 
of the document. In this case, they are proposing that the HTTP header 
say nothing about the document encoding. Note that this would usually 
mean taking action to disable any server defaults.)"
may be
"(Some people would argue that it is rarely appropriate to declare the 
encoding in the HTTP header if you are going to repeat it in the content 
of the document. In this case, they are proposing that the HTTP header 
say nothing about the document encoding, OR THAT THE DECLARATION INSIDE 
THE DOCUMENT TAKE PRECEDENCE. AFTER ALL IT IS WHAT THE AUTHOR WANTS. 
Note that this would usually mean taking action to disable any server 
defaults.)"

8- § The Content-Type meta element
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#metacontenttype)
Typo at 1st line
s/should used/should be used/

9- § The XML declaration
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#xmldeclaration)
- 2nd script example (...xml:lang="en" lang="en"...)
To be consistent, I suggest other language tag than "en". Reader may 
wonder why to care about encoding, since "en" is US-ASCII and thus 
compatible with UTF-8.

10- § CSS's @charset rule
 ( 
http://www.w3.org/International/tutorials/tutorial-char-enc/temp#atcharset)
s/non-ASCII/non-US-ASCII/

11- § Precedence rules
(http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0400)
Precedence rules for linked CSS style sheets. What is the rule if the 
in-document HTML encoding is not the same as the one declared in 
external CSS?

12- § What do I need to know about normalization?
 (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#n11nhow)
-3rd <p>
"Most keyboards for European languages output text in NFC already, but 
this is less likely to be the case if dealing with many non-European 
languages."
May be add "Mostly because (pre-)composed characters are not present in 
(some) non-European keyboards" or somthing like.


Regards,
Najib


Richard Ishida wrote:
> Comments are being sought on this article prior to final release. Please send any comments to this list (www-international@w3.org). We expect to publish a final version in one to two weeks.
>
> See http://www.w3.org/International/tutorials/tutorial-char-enc/temp
>
> This is an update, in a temporary location, of the tutorial Character encodings in HTML and CSS. (Please be careful about bookmarking the location, since it is only temporary. )
>
> A lot of new material was added, eg. related to the UTF-8 BOM, normalization, etc., and I rearranged the material significantly.  The rearrangement was to downplay slightly the XHTML 1.0 issues, given that that is now only relevant to IE6, but also to help readers more quickly find information they need for the format they are dealing with.
>
> The explicit distinction between XHTML 1.0 and XHTML 1.1 with regard to MIME types was removed, since the XHTML2 WG is hopefully very close to issuing a PER that enables XHTML 1.1 to be served as text/html.  
>
> The update adds information about HTML5.
>
> Where a section corresponds to an article that has been updated, those updates were also migrated to this document.
>
> Thanks,
> RI
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>
> http://www.w3.org/International/
> http://rishida.net/
>
>   

-- 
Najib TOUNSI (tounsi at w3.org)
W3C Office in Morocco (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingénieurs, BP. 765 Agdal-RABAT Morocco
Phone : +212 (0) 537 68 71 50  Fax : +212 (0) 537 77 88 53
Mobile: +212 (0) 661 22 00 30 
Received on Friday, 19 February 2010 23:41:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 19 February 2010 23:41:09 GMT