lomen@hanimail.com wrote: > > I am making CGI programs that print Unicode html text. > > If using unicode, is any problem? Many users are still using Netscape 4.X, which has problems with Unicode when the language is one that cannot be presented using Times and Courier. See the attached message. > For example, "0xfeff" character or "Content-type:text/html\n\n"? Putting the "BOM" (0xFEFF) at the beginning of the HTML document (after the HTTP response headers) is a good idea. The BOM is used by the browsers to auto-detect Unicode. It is also a good idea to add the charset parameter to your Content-Type header: Content-Type: text/html; charset=ISO-10646-UCS-2 (By the way, does anyone know the status of the UTF-16 registration?) Keep in mind that the "Content-Type" header and all of the other HTTP response headers must be in ASCII (i.e. single byte, not double byte), even if the HTML document itself is in Unicode. You can see an example of a Unicode page here: http://www.fxis.co.jp/DMS/sgml/xml/charset/utf-16/utf16-be-dos.html Try copying and pasting the above URL into my HTTP/HTML source viewer: http://webtools.mozilla.org/web-sniffer/ Erik
attached mail follows:
Glen Perkins wrote: > > I'd really like to take the Right Path of encoding the form in UTF-8 and > having it return the form data in UTF-8, so I could have a generic solution > of any language(s) going out and any language(s) coming back. It really does > have to work, though, or else the people I do it for, who don't know much > about i18n and therefore hate and oppose it, will say "See! We told you it > was a bad idea!" Urrrgh. > > Do you know under what circumstances this is likely to work? Would it work, > say, for both IE and Netscape, versions 3 or later, on Win & Macs? I'd > certainly prefer to be more generic than that (support for unix being > particularly near to my heart), but current browser stats indicate that > anything that works on the above (NS/IE 3+ on Win/Mac) would cover a large > enough percentage of the market to be worth doing. Requiring version 4 > browsers might even be tolerable now in many cases. (And I'm talking about > the Internet at large, not an intranet.) Netscape started supporting Unicode in the Windows version of Navigator 3.0. However, the feature was disabled by default, and could be enabled only through a special registry setting. (Mac/Unix Nav3 doesn't support Unicode.) Navigator 4.0 supports Unicode on all of the platforms (Windows, Mac, Unix), except that the Win32 version does not support the crucial font switching (font linking in MS-speak). This means that the Win32 version of Nav4 will only use one font for Unicode documents. (Win16/Mac/Unix Nav4 supports font switching.) Moreover, in Win32, the font must be set manually by the user in the font preferences dialog. The default fonts for Unicode documents are Times and Courier, even in the Japanese version of Navigator. So Japanese UTF-8 documents will not display correctly on the average Japanese Win32 Nav4 user's machine, since most users do not fiddle with font prefs, particularly the Unicode ones. So I suppose you could take a look at Navigator's market share in non-Times/Courier markets such as Japan, Korea, Taiwan, China, etc, and if you think that market share is small enough to ignore those users, you could choose to use UTF-8 in your application (HTML form + CGI). If you decide that their market share is not small enough to ignore, you could support them via multiple monolingual documents in traditional charsets such as Shift_JIS, EUC-KR, Big5, GB2312, etc. > > > In theory, if you can reliably label the charset of the HTML document > > > containing the form (via HTTP charset and HTML META charset), then the > > > form submission should be in that charset too. You can then simply > > > insert that charset label in the hidden input field too, and look at > > > that when the form submission arrives. > > > > Doesn't work through transcoding (incl. translation) servers. I've also > > heard stories of old Japanese browsers that would transcode the input to > the > > platform encoding and then forget what the original was. So forms are > > submitted in the platform encoding, regardless. Certainly broken, > probably > > mostly extinct by now, but still shows how a bad protocol can come and > bite > > you. > > Yes, I obviously need to add to the above IE/NS on Win/Mac specification > that it work on all major language versions of those browsers. I think he may have been referring to old Japanese versions of Mosaic and others, not the Japanese versions of Netscape. > So, François, it sounds as though your hack -- returning known data from a > hidden field to determing the encoding -- might be needed as a data > integrity check at the very least. > > Now I'm wondering what such data would look like and what could be learned > from it. If I just put a bunch of bytes up there and they're echoed back at > me verbatim, what would that tell me? I can imagine putting up a page > encoded in Shift-JIS with a hidden field also in Shift-JIS, using the > ACCEPT-CHARSET="UTF-8" technique, and then testing the result to see whether > it came back as UTF-8, unchanged, or other. If unchanged, though, would that > mean the returned data really was Shift-JIS? It seems to me it could also be > Big-5, Latin-1, or any of several other encodings, returned by a browser > than used the default system encoding to encode form data. Most browsers submit forms in the same charset as the original form. So if your form is in Shift_JIS, and the user can actually read it, then the browser must know that it is in Shift_JIS, and will submit the form in Shift_JIS. On the other hand, if the user is viewing your form through a translator that translates Japanese to traditional Chinese, then the form might be in Big5 by the time it reaches the browser. The form submission will then also be in Big5. One question is whether the hidden field will also have been translated. I conducted a little experiment with AltaVista yesterday, and found that the text in a Spanish submit button was not translated to English, while the rest of the document was translated. Future versions of the translator(s) may be more aggressive, however, and actually translate the text in HTML attribute values too, including hidden fields perhaps? But they wouldn't want to translate *all* HTML attribute values (e.g. align="right"), so perhaps they wouldn't translate hidden fields either. ErikReceived on Tuesday, 8 February 2000 20:14:36 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT