- From: Najib Tounsi <ntounsi@gmail.com>
- Date: Fri, 19 Feb 2010 23:45:05 +0000
- To: Richard Ishida <ishida@w3.org>
- CC: www-international@w3.org
Hi Richard, Please find below some feedbacks about " Character encodings in HTML and CSS" 1- § Character sets, coded character sets, and encodings (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0060) "A coded character set is a set of characters for which a unique number has been assigned to each character. Units of a coded character set are known as code points." I suggest to add that the value assigned to each character corresponds to its position in the coded character set. Indeed, later on you mention this position when you talk about encoding: "the encoding is a straightforward mapping to the scalar position of the characters in the coded character set." for (ISO 8859-1). ... "the first line of numbers represents the position of a character in the Unicode coded character set" for Unicode. However, in a set there is no particular order in general. 2- Typo s/A character escape is an way of representing/A character escape is a way of representing/ 3- § Applying an encoding to your content (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#applyingencoding) "As a content author you need to check that your editor or scripts are saving text in the encoding of your choice." I suggest "As a content author you need to check that your editor or scripts are saving text in the encoding YOU EXPECT OR LET YOU SELECT THE ONE of your choice." 4- § CSS (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#csssummary) - May be s/non-ASCII/non-US-ASCII/. - "you should use the @charset rule as the first thing on the page." May be say "you should use the @charset rule as the first thing on the page, SET TO THE SAME ENCODING AS THE CORRESPO?DING HTML PAGE." (BTW, it's worth to test what happens when the two encodings are declared not the same. Does all browsers agree?) 5- § What is the HTTP header? (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#httpheadwhat) Not very important, but in the script example: "Date: Wed, 05 Nov 2003 10:46:04 GMT" use a more recent date? 6- § MIME types and DOCTYPE switching (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0150) - 5th <p> "Unfortunately, Internet Explorer currently doesn't support files served as XML" is the word "currently" still accurate? - 9th <p> "The orange MIME-type labels are not recommended." "The orange MIME-type labels (the two at the bottom) are not recommended." because when reading a non colored printed version :-) 7- § Pros and cons of using the HTTP header for encoding declarations / (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0130) - Advanages (end of 1st item) "it doesn't matter that transcoders typically do not change the internal encoding declarations, just the document encoding." "it doesn't matter that transcoders typically do not change the internal encoding declarations, just the document encoding (AND THE HTTP INFORMATION WHICH GOES WITH IT)." - Disadvantages (3rd item) "There are potential problems for both static and dynamic documents if they are to be saved to a location such as a CD or hard disk." "There are potential problems for both static and dynamic documents if they are NOT READ ON A SERVER (e.g. THEY WERE saved to a location such as a CD or hard disk)." - So should I use this method? (next <p>) "the file may be changed by an intermediary before it reaches the user [...], you may particularly want to consider using the HTTP declaration." May be "the file may be changed by an intermediary before it reaches the user [...], you may particularly want to consider using the HTTP declaration, SINCE IT IS CHANGED ACCORDINGLY." - your following remark: "(Some people would argue that it is rarely appropriate to declare the encoding in the HTTP header if you are going to repeat it in the content of the document. In this case, they are proposing that the HTTP header say nothing about the document encoding. Note that this would usually mean taking action to disable any server defaults.)" may be "(Some people would argue that it is rarely appropriate to declare the encoding in the HTTP header if you are going to repeat it in the content of the document. In this case, they are proposing that the HTTP header say nothing about the document encoding, OR THAT THE DECLARATION INSIDE THE DOCUMENT TAKE PRECEDENCE. AFTER ALL IT IS WHAT THE AUTHOR WANTS. Note that this would usually mean taking action to disable any server defaults.)" 8- § The Content-Type meta element (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#metacontenttype) Typo at 1st line s/should used/should be used/ 9- § The XML declaration (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#xmldeclaration) - 2nd script example (...xml:lang="en" lang="en"...) To be consistent, I suggest other language tag than "en". Reader may wonder why to care about encoding, since "en" is US-ASCII and thus compatible with UTF-8. 10- § CSS's @charset rule ( http://www.w3.org/International/tutorials/tutorial-char-enc/temp#atcharset) s/non-ASCII/non-US-ASCII/ 11- § Precedence rules (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0400) Precedence rules for linked CSS style sheets. What is the rule if the in-document HTML encoding is not the same as the one declared in external CSS? 12- § What do I need to know about normalization? (http://www.w3.org/International/tutorials/tutorial-char-enc/temp#n11nhow) -3rd <p> "Most keyboards for European languages output text in NFC already, but this is less likely to be the case if dealing with many non-European languages." May be add "Mostly because (pre-)composed characters are not present in (some) non-European keyboards" or somthing like. Regards, Najib Richard Ishida wrote: > Comments are being sought on this article prior to final release. Please send any comments to this list (www-international@w3.org). We expect to publish a final version in one to two weeks. > > See http://www.w3.org/International/tutorials/tutorial-char-enc/temp > > This is an update, in a temporary location, of the tutorial Character encodings in HTML and CSS. (Please be careful about bookmarking the location, since it is only temporary. ) > > A lot of new material was added, eg. related to the UTF-8 BOM, normalization, etc., and I rearranged the material significantly. The rearrangement was to downplay slightly the XHTML 1.0 issues, given that that is now only relevant to IE6, but also to help readers more quickly find information they need for the format they are dealing with. > > The explicit distinction between XHTML 1.0 and XHTML 1.1 with regard to MIME types was removed, since the XHTML2 WG is hopefully very close to issuing a PER that enables XHTML 1.1 to be served as text/html. > > The update adds information about HTML5. > > Where a section corresponds to an article that has been updated, those updates were also migrated to this document. > > Thanks, > RI > > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/International/ > http://rishida.net/ > > -- Najib TOUNSI (tounsi at w3.org) W3C Office in Morocco (http://www.w3c.org.ma/) Ecole Mohammadia d'Ingénieurs, BP. 765 Agdal-RABAT Morocco Phone : +212 (0) 537 68 71 50 Fax : +212 (0) 537 77 88 53 Mobile: +212 (0) 661 22 00 30
Received on Friday, 19 February 2010 23:41:06 UTC