- From: Richard Ishida <ishida@w3.org>
- Date: Wed, 5 Oct 2005 17:42:15 +0100
- To: "GEO" <public-i18n-geo@w3.org>
From: Deborah Cawkwell [mailto:deborah.cawkwell@bbc.co.uk] Sent: 05 October 2005 17:31 To: ishida@w3.org Subject: Comments on FAQ Wed 7 Sept 2005 RE: New article for REVIEW: Upgrading from language-specific legacy encoding to Unicode encoding ISSUE 1 - CSS & UNICODE Frank Yung-Fong Tang Tue 8/23/2005 20:02 I think you should mention not only charset with HTML, but also issue with CSS and seperate JavaScript file. The issue with \ unicode in CSS is quite tricky. DC ACTION: don't know about this. Research &/or ask Frank. No trickiness mentioned in: FAQ: CSS character encoding declarations How do I declare the character encoding inside a CSS (Cascading Style Sheets) style sheet? http://www.w3.org/International/questions/qa-css-charset ISSUE 2 - BIDI Jony Rosenne Tue 8/23/2005 22:18 I suggest that this article should at least mention the problems of legacy conversion to Unicode specific to bidi, i.e. visual order vs. logical order. HTML supports this in two ways: 1) Specifying ISO-8859-8 as the character set indicates visual order, and my recommendation is to leave these pages alone. 2) If you must upgrade, use the BDO tag. See http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2.4 ISSUES 3 Some quick feedback. >Modern operating systems support Unicode: This is in a funny order, in the middle of a section about fonts, without even a header to set it off. >Unicode: the operating system or browser has fonts Should mention that many programs will using 'fallback' mechanisms; where a font doesn't have all the glyphs, it will switch fonts. >Page weight / download cost is not really an issue: given that a large proportion of a web page is HTML mark-up, where characters remain 1 byte, Give example page & sizes in legacy & Unicode. >Characters that do not fall into the ASCII range, such as Chinese, Arabic, Russian, may use 2 or even 3 bytes. Chinese encodings already use more than 1 byte per character with legacy encodings, where they use double bytes. Treat CJK in separate bullet >rather than with a legacy encoding where the source text is not readable and uses different characters to point to code points. ? >Server side applications Server-side applications [Otherwise it is a side application that has to do with servers] Suggest passing this by the UTC (unicode@unicode.org) for feedback. ----------------------- ISSUES 4 Frank Ellermann Wed 8/24/2005 13:48 Richard Ishida wrote: > Comments are being sought on this article | UTF-16 is often used for the system back-end. You have "no byte order problem" for UTF-8, so you might add a note about UTF-16LE vs. UTF-16BE below UTF-16. And another note that u+10000 etc. needs two UTF-16 "half words" (please replace correct term). | Font display problems: | Legacy code pages (eg ISO-8859-1/windows-1252) That example isn't convincing, use something else, e.g. Latin-2 and MacRoman. | Page weight / download cost is not really an issue [...] | the difference between legacy encoding and Unicode encoding is quite | negligible. Maybe s/Unicode/UTF-8/, you're talking about bytes later. | HTML head, eg, <meta http-equiv="Content-Type" [...] Maybe add a third example for XML: <?xml version="1.1" encoding="utf-8" ?> ---------------------- ISSUES 5 Frank Yung-Fong Tang Thu 8/11/2005 21:23 This is comment for related document, but not exactly the one you point out. 1. Can you change the example in http://www.w3.org/International/O-HTTP-charset from The line in the HTTP header typically looks like this: Content-Type: text/html; charset=iso-8859-1 to The line in the HTTP header typically looks like this: Content-Type: text/html; charset=UTF-8 I know it is just an example in a different page, but some dump person sometime just like to copy code from example. And I think it is nice to let those dummer to copy UTF-8 instead of ISO-8859-1 even either of them are bad choice to hard code. 2. Also, in http://www.w3.org/International/O-HTTP-charset "For Java Servlets, use the setContentType method on the ServletResponse before obtaining any object (Stream or Writer) used for output, e.g.: resource.setContentType ("text/html;charset=utf-8"); If you use a Writer, the Servlet automatically takes care of the conversion from Java Strings to the encoding selected." I think this infor is only recommend for the use of J2EE 1.3. The J2EE 1.4 change it by adding the setCharacterEncoding(java.lang.String) method. in 1.4 version of J2EE ServletResponse document http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/ServletResponse.html "The charset for the MIME body response can be specified explicitly using the setCharacterEncoding(java.lang.String) and setContentType(java.lang.String) methods, or implicitly using the setLocale(java.util.Locale) method. Explicit specifications take precedence over implicit specifications. If no charset is specified, ISO-8859-1 will be used. The setCharacterEncoding, setContentType, or setLocale method must be called before getWriter and before committing the response for the character encoding to be used." You should mention the setCharacterEncoding(java.lang.String) there for J2EE 1.4. Richard Ishida wrote on 8/11/2005, 1:09 PM: > > > > > Title: Changing page encoding to UTF-8 > http://www.w3.org/International/questions/changing-encoding > > Comments are being sought on this article prior to final release. > Please send any comments to www-international@w3.org. We expect to > publish a final version in one to two weeks. > > The article aims to answer the question: "How do I change the encoding > of my (X)HTML pages to UTF-8?" ----------------------- ISSUE 6 Frank Yung-Fong Tang Thu 8/11/2005 21:31 since your document title is "FAQ: Changing page encoding to UTF-8 (Draft for review)" instead of "FAQ: Changing html page encoding to UTF-8 (Draft for review)", I recommend you also consider the slightly different case in the WS environement, e.g. the case for WSDL, XML Schema, UDDI and SOAP in WS-I Basic Profile 1. Please take a look at my study note in http://people.netscape.com/ftang/paper/WS-I-i18n.htm for details. If you think the issue with SOAP/XML Schema/UDDI and WSDL may be too complicate to be mention in your document, then I suggest you change your document title to "FAQ: Changing (x)html page encoding to UTF-8" by adding "(x)html" to it. > http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Received on Wednesday, 5 October 2005 16:42:22 UTC