W3C home > Mailing lists > Public > www-international@w3.org > January to March 2010

RE: For review: Character encodings in HTML and CSS

From: Richard Ishida <ishida@w3.org>
Date: Mon, 22 Feb 2010 15:31:22 -0000
To: <ntounsi@emi.ac.ma>
Cc: <www-international@w3.org>
Message-ID: <005301cab3d4$16dde540$4499afc0$@org>
Hi Najib,

Thanks for your comments. See below...

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/




> -----Original Message-----
> From: Najib Tounsi [mailto:ntounsi@gmail.com]
> Sent: 19 February 2010 23:45
> To: Richard Ishida
> Cc: www-international@w3.org
> Subject: Re: For review: Character encodings in HTML and CSS
> 
> Hi Richard,
> 
> Please find below some feedbacks about " Character encodings in HTML and
> CSS"
> 
> 1- § Character sets, coded character sets, and encodings
> (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0060)
> "A coded character set is a set of characters for which a unique number
> has been assigned to each character. Units of a coded character set are
> known as code points."
> I suggest to add that the value assigned to each character corresponds
> to its position in the coded character set. Indeed, later on you mention
> this position when you talk about encoding:
> "the encoding is a straightforward mapping to the scalar position of the
> characters in the coded character set." for (ISO 8859-1).
> ...
> "the first line of numbers represents the position of a character in the
> Unicode coded character set" for Unicode.
> However, in a set there is no particular order in general.

I'm not sure how important that is, but I added it.

> 
> 2- Typo
> s/A character escape is an  way of representing/A character escape is a
> way of representing/

Already fixed, thanks.


> 
> 3- § Applying an encoding to your content
> (http://www.w3.org/International/tutorials/tutorial-char-
> enc/temp#applyingencoding)
> "As a content author you need to check that your editor or scripts are
> saving text in the encoding of your choice."
> I suggest
> "As a content author you need to check that your editor or scripts are
> saving text in the encoding YOU EXPECT OR LET YOU SELECT THE ONE of
> your
> choice."
> 
> 4- § CSS
> (http://www.w3.org/International/tutorials/tutorial-char-
> enc/temp#csssummary)
> - May be s/non-ASCII/non-US-ASCII/.
> - "you should use the @charset rule as the first thing on the page."
> May be say "you should use the @charset rule as the first thing on the
> page, SET TO THE SAME ENCODING AS THE CORRESPO?DING HTML PAGE."
> (BTW, it's worth to test what happens when the two encodings are
> declared not the same. Does all browsers agree?)

Actually it doesn't need to be set to the same encoding of the HTML at all.  See the i18n tests for examples of this. http://www.w3.org/International/tests/list-html-css#cssencoding


> 
> 5- § What is the HTTP header?
> (http://www.w3.org/International/tutorials/tutorial-char-
> enc/temp#httpheadwhat)
> Not very important, but in the script example:
> "Date: Wed, 05 Nov 2003 10:46:04 GMT"
> use a more recent date?

That involves changes to other dates too.  I will do if I have some slack time.


> 
> 6- § MIME types and DOCTYPE switching
> (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0150)
> - 5th <p>
> "Unfortunately, Internet Explorer currently doesn't support files served
> as XML"
> is the word "currently" still accurate?

Yes.


> - 9th <p>
> "The orange MIME-type labels are not recommended."
> "The orange MIME-type labels (the two at the bottom) are not
> recommended."
> because when reading a non colored printed version :-)

Ah, yes.  Thanks for reminding me of that. Fixed. 


> 
> 7- § Pros and cons of using the HTTP header for encoding declarations /
> (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0130)
> - Advanages (end of 1st item)
> "it doesn't matter that transcoders typically do not change the internal
> encoding declarations, just the document encoding."
> "it doesn't matter that transcoders typically do not change the internal
> encoding declarations, just the document encoding (AND THE HTTP
> INFORMATION WHICH GOES WITH IT)."

Done.

> - Disadvantages (3rd item)
> "There are potential problems for both static and dynamic documents if
> they are to be saved to a location such as a CD or hard disk."
> "There are potential problems for both static and dynamic documents if
> they are NOT READ ON A SERVER (e.g. THEY WERE saved to a location such
> as a CD or hard disk)."

Done.

> - So should I use this method? (next <p>)
> "the file may be changed by an intermediary before it reaches the user
> [...], you may particularly want to consider using the HTTP declaration."
> May be
> "the file may be changed by an intermediary before it reaches the user
> [...], you may particularly want to consider using the HTTP declaration,
> SINCE IT IS CHANGED ACCORDINGLY."

Done.

> - your following remark:
> "(Some people would argue that it is rarely appropriate to declare the
> encoding in the HTTP header if you are going to repeat it in the content
> of the document. In this case, they are proposing that the HTTP header
> say nothing about the document encoding. Note that this would usually
> mean taking action to disable any server defaults.)"
> may be
> "(Some people would argue that it is rarely appropriate to declare the
> encoding in the HTTP header if you are going to repeat it in the content
> of the document. In this case, they are proposing that the HTTP header
> say nothing about the document encoding, OR THAT THE DECLARATION
> INSIDE
> THE DOCUMENT TAKE PRECEDENCE. AFTER ALL IT IS WHAT THE AUTHOR
> WANTS.
> Note that this would usually mean taking action to disable any server
> defaults.)"

I think your addition is incorrect.  They don't argue that the precedence should be changes, since that would cause problems for legacy.

> 
> 8- § The Content-Type meta element
> (http://www.w3.org/International/tutorials/tutorial-char-
> enc/temp#metacontenttype)
> Typo at 1st line
> s/should used/should be used/

Fixed.

> 
> 9- § The XML declaration
> (http://www.w3.org/International/tutorials/tutorial-char-
> enc/temp#xmldeclaration)
> - 2nd script example (...xml:lang="en" lang="en"...)
> To be consistent, I suggest other language tag than "en". Reader may
> wonder why to care about encoding, since "en" is US-ASCII and thus
> compatible with UTF-8.

Well if they think that, they shouldn't, and perhaps this example will help them realise that encoding declarations need to be considered whatever language you are using.

> 
> 10- § CSS's @charset rule
>  (
> http://www.w3.org/International/tutorials/tutorial-char-enc/temp#atcharset)
> s/non-ASCII/non-US-ASCII/

We don't use US-ASCII elsewhere (since I don't think we need that level of specificity) so I don't think we should here.

> 
> 11- § Precedence rules
> (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0400)
> Precedence rules for linked CSS style sheets. What is the rule if the
> in-document HTML encoding is not the same as the one declared in
> external CSS?

As each file is read into the browser, the encoding declarations are used to convert the characters to Unicode internally.  This then allows direct comparisons, character to character across html, css, etc. files.

> 
> 12- § What do I need to know about normalization?
>  (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#n11nhow)
> -3rd <p>
> "Most keyboards for European languages output text in NFC already, but
> this is less likely to be the case if dealing with many non-European
> languages."
> May be add "Mostly because (pre-)composed characters are not present in
> (some) non-European keyboards" or somthing like.

Well, it's slightly more complicated than that, but I didn't want to get into too much detail here.
> 
> 
> Regards,
> Najib
> 
Received on Monday, 22 February 2010 15:31:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 22 February 2010 15:32:01 GMT