- From: Mark Davis <mark.davis@jtcsv.com>
- Date: Thu, 25 Mar 2004 07:56:38 -0800
- To: "Richard Ishida" <ishida@w3.org>
- Cc: <www-international@w3.org>, <www-international-request@w3.org>, <www-i18n-comments@w3.org>
Nice doc. From a quick glance: > Select an encoding that maximizes the opportunity to directly represent characters and minimizes the need to represent characters by using character escapes. Important: you should make people aware that there are many variants of charsets like Shift-JIS, and that people are strongly recommended to also escape *all* characters that vary. Cf. XML Japanese profile MURATA Makoto Ed., XML Japanese Profile, W3C Note. (See http://www.w3.org/TR/japanese-xml/.) That also needs to be clarified in the section starting: >Only use escapes for characters in exceptional circumstances - create pages using an encoding that supports all the characters you need >user agent Although "user agent" is glossed at first reference, it is still a rather awkward term. As a tutorial, it might be better to just use the word browser -- and say near the top that the term 'browser' is used for simplicity, but the text really applies to a broader range of so-called user-agents, including [list other examples!]. > a.. We recommend the use of XHTML wherever possible; and if you serve XHTML as text/html we assume that you are conforming to the compatibility guidelines in Appendix C of the XHTML 1.0 specification. a.. We recognize that XHTML served as XML is still not widely supported, and that therefore many XHTML 1.0 pages will be served as text/html. Isn't this a pretty counter-productive recommendation; it sounds like you are saying: "we recommend that you use something that won't work on the vast majority of your users browsers"? >Where appropriate, declare the page's character encoding by setting the charset parameter in the HTTP Content-Type header. This 'feature' is a real pain. The advice needs to be much clearer, something like. If all those who will be posting pages can reset the charset parameter, then you can impose a default on all the pages. If not, don't. >There are three characters which should always appear in content as escapes should => must >The following table lists Unicode characters that should not be used in a markup context, according to the W3C Note and Unicode Technical Report Unicode in XML & Other Markup Languages. You should use markup instead. This needs to be a bit clearer. Many of these are HTML-specific. Unless the XML DTD/Schema author provided the same facilities, for example, LRE may not be available. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Matitiahu Allouche" <matial@il.ibm.com> To: "Richard Ishida" <ishida@w3.org> Cc: <www-international@w3.org>; <www-international-request@w3.org>; <www-i18n-comments@w3.org> Sent: Thu, 2004 Mar 25 00:08 Subject: Re: New Tutorial: Character sets & encodings in XHTML, HTML and CSS > > A few comments. Sorry that they are mostly nitpickings. > > 1) Section "Character escapes" mentions 好 as the escape for the > Hebrew letter Alef. I don't know how this value was obtained, U+597D is > not a defined Unicode character. The right escape IMHO should be א. > > 2) In section "Consider using a Unicode encoding", instead of > "Unicode encodings support many languages with a single encoding across > all pages and forms, regardless of language." > I suggest > "All Unicode encodings support many languages and can accomodate all kinds > of pages and forms containing any mixture of those languages." > > 3) In section "When to do this" following mention of the IANA registry, > add "are" after "there" in "there no disadvantages". > > 4) In section "Precedence rules", add "is" after "it" in "since it > likely". > > 5) In the title "entities and numeric charater references (ncrs)", the > acronym should be spelled "NCRs", to be coherent with further > occurrences, and to distinguish the plural "s" from the acronym itself. > > 6) In the next paragraph, "are way" should be "are ways". > > 7) The example for CSS escape is *not* terminated by a space, despite > stating in the previous line that it should be. > > 8) In section "When to use escapes", the sentence "For example, to > represent Chinese characters in an ISO Latin 1 document." is not a > complete sentence, and should be an added clause to the previous sentence > (separated by comma). > > 9) In the table contained in section "Other Unicode characters are OK", > LRM and RLM are commented as "Deprecated in Unicode". I am very > surprised. What is the basis for such a statement? > > 10) In section "Compatibility characters vary in appropriateness, add a > comma before "in some other cases it denotes a property". > > > Shalom (Regards), Mati > Bidi Architect > Globalization Center Of Competency - Bidirectional Scripts > IBM Israel > Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52 > 554160 > > > > > > "Richard Ishida" <ishida@w3.org> > Sent by: www-international-request@w3.org > 24/03/2004 14:44 > > To > <www-international@w3.org> > cc > > Subject > New Tutorial: Character sets & encodings in XHTML, HTML and CSS > > > > > > The GEO task force has published its first tutorial: > > Character sets & encodings in XHTML, HTML and CSS > > At: http://www.w3.org/International/tutorials/tutorial-char-enc.html > > > This tutorial has been worked on for quite some time by the GEO Task Force > of the W3C Internationalization Working Group, and it is thought to be > ready for publication. For an undetermined initial period we will leave > the status as Draft to indicate that we invite feedback on the document. > > > You can find links to internationalization specifications, FAQs, articles, > tools, tests, and soon tutorials at http://www.w3.org/International/ > > ============ > Richard Ishida > W3C > > contact info: http://www.w3.org/People/Ishida/ > > http://www.w3.org/International/ > >
Received on Thursday, 25 March 2004 10:56:41 UTC