Re: Using unicode or MBCS characters in forms from Gavin Nicol on 1996-06-21 (www-international@w3.org from April to June 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Fri, 21 Jun 1996 05:16:10 GMT
To: erik@netscape.com
Cc: JMHX.DSKPO33C@dskbgw1.itg.ti.com, www-international@w3.org
Message-Id: <199606210516.FAA02611@wiley.EBT.COM>

>You need 2.02 or higher. Look for "httpAcceptLanguage" in the *.ad file.
>That is the X resource that you need to set. We haven't created a GUI
>for this yet.

How about Windows? This alone would help the sniffing logic a great
deal. Actually, the combination of Accept-Charset and Accept-Language
would provide enough information so that you could get close to the
results of correct labelling (I do not say they should replace it!!!).

>We want to avoid cluttering the UI with options that the average user
>doesn't understand. I suppose we could put it in a config file.

I would recommend having language, encoding, and other such things in
the option GUI. If they are expected to understand proxies, they
should be able to understand this.

>> What happens if you have, on a single site, many different forms in
>> many different encodings? What happens if the forms are dynamically
>> generated, where you do not know a priori what the encoding of the
>> form is/was?
> 
>I guess we haven't come across many cases like this.

One reason is because it's hard to do now, as are truly multilingual
sites. This *must* change.

Such cases *will* increase. There is a definite trend toward managing
web data in database *especially* in large corporate
intranets. There's a business tip for you. 

>>Data sniffing would also be simplified if a single encoding was
>>choosen for each language. 
> 
>The fact is that more than one encoding is used for some languages. It
>would be nice if we could get the servers to use fewer encodings without
>affecting the client users (customers).

Sure. Initially, I was under the impression that Navigator sent
ISO-2022-JP to the server (kind of makes sense..) and would have
preferred that to the current scheme.

The current scheme *can* be made to work, even to work fairly well,
but as the number of possible encodings for data sent to the server
increases, the less of a chance you have of getting meaningful results
from the sniffing logic.

Because the vendors almost uniformly do not do the correct thing,
anyone attempting to create a truly multilingual site will have to
implement thousands of lines of code more than they would otherwise
have to, and it will *still* not always work. This is what I have
wanted to have fixed since so long ago.

One word: interoperability.

>Extended_UNIX_Code_Fixed_Width_for_Japanese is not the same as EUC-JP.
>That is the wchar_t form. What you're thinking of is
>Extended_UNIX_Code_Packed_Format_for_Japanese.

My mistake.

Received on Friday, 21 June 1996 01:18:15 UTC