- From: Drazen Kacar <Drazen.Kacar@public.srce.hr>
- Date: Thu, 5 Dec 1996 00:09:09 +0100 (MET)
- To: erik@netscape.com
- Cc: Alan_Barrett/DUB/Lotus.LOTUSINT@crd.lotus.com, www-international@w3.org, bobj@netscape.com, wjs@netscape.com, Chris.Lilley@sophia.inria.fr, Ed_Batutis/CAM/Lotus@crd.lotus.com
Erik van der Poel wrote: > How about using a more compact representation of Accept-Charset. E.g. > bit masks corresponding to the number in the charset registry. This > would omit the "q" parameter, but I'm not sure this is needed in the > Accept-Charset case anyway. (It's probably needed for Accept-Language.) I'd say it's mandatory for accept-language with transparent negotiation (unless it's 1.0 which is the default). Whether it's needed for accept-charset or not is up to the information provider to decide. I'd say yes. > > (1) If the user, though the UI, says they want to "Request Multi-Lingual > > Documents" then the browser should send:- > > I don't think we should have UI for the Accept-Charset. Think about > novice users. Will they understand it? Yes. Perhaps people who live in Latin 1 world won't, but everything works for them anyway. I live in Latin 2 world and I have reasonable technical background, so I can hardly be called a novice user. But I can tell you how it looks to novice users. I'll take Usenet as an example, the web is even more confusing because there is no interaction with the person who set up the page. My native language needs 5 letters from Latin 2 code page, the rest is in US-ASCII. Latin 2 on Unix means ISO 8859-2 and Unix host were connected before anything else. Some people used ISO 8859-2 and some used ASCII approximations. Then Windows came. Latin 2 on Windows today means windows-1250 code page, at the time that CP was unregistered with IANA and didn't mean anything on Internet. 1250 has the characters we need, but some of them are not at the same position as in ISO 8859-2. NSN 2.0 was unaware of this and was showing iso-8859-2 documents with 1250 code page. So, our novice user sees some very strange characters and is completely clueless about what's going on. Then he tries to post something with the national characters in his local configuration and gets a flame or two back because Usenet is 7 bit and he did not send charset parameter in content-type header, nor did he encode his post. At this point we have frustrated novice user. A slight digression here. It didn't start with flaming. Unix people were helpfull at first, trying to explain problems and trying to find solutions. As a result, some of Windows people installed ISO 8859-2. Majority didn't. Those that didn't were unable to read or write ISO code page. Their GUIs were useless. Even if they telneted to Unix host, they still had 1250 on terminal emulator, so it was the same. They had their own experience of which Unix people did not know much. National characters were used much before it dawned to someone in Microsoft that those are needed. People usually used 7 bit ISO 646 code page. There were screen fonts, printer fonts, keyboard drivers and everything worked. Then Microsoft decided to ship IBM 852 code page with DOS. Some people switched, but not many. Then there were Windows and Microsoft shiped CP1250. Again some people switched. There was incompatibility on the same platform and people were actively fighting for "their standard". Computer magazines were pretty bad and they still are. The situation, as they understood it, was calling for one standard which had to be enforced upon everyone. With their background on Microsoft platforms, they were completely unaware of MIME, the little addition to Content-type in HTTP and localization already present on Unix. Unix people, on the other hand, got tired with the same questions, flames from those who were reading about "Unix lunatics", people who would not RTFM when kindly directed to it. The war started. ISO 8859-2 won on Usenet. On the web CP1250 won the majority of pages. Largely because there were no authoring tools which knew about difference between ISO 8859-2 and CP1250. There are languages that need Latin 2 code page, but don't suffer because of this. People who use them might be ignorant about code page problems. But those who have problems know about charset issues. There can't be novice users. Not after the first post with wrong charset or after viewing the first web page with another charset. Now, why is accept-charset needed? Take NSN 3.0 which can translate ISO 8859-2 to windows-1250. What happens when there is no Latin 2 font available? NSN uses ISO 8859-1. Do you know how it looks? If you can understand any Latin 1 language that needs accented characters, you can get a picture. Take a page in French, German or whichever and display it with Latin 2 code page. That's how it looks. You can read it if you really have to, but it's not pleasant in the least. Should I mention search engines? Or proxies? Better not. :) Some servers convert code pages on the fly. This takes resources, but it's the only way which can ensure that information is readable. Not good presented, not cool, just plain readable without a headache. That's why. -- Life is a sexually transmitted disease. dave@fly.cc.fer.hr dave@zemris.fer.hr
Received on Wednesday, 4 December 1996 18:09:44 UTC