- From: Albert Lunde <Albert-Lunde@nwu.edu>
- Date: Mon, 26 Dec 1994 20:19:55 -0600 (CST)
- To: James Gosling <jag@scndprsn.eng.sun.com>
- Cc: gtn@ebt.com, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
> I agree wholeheartedly that Unicode is The Answer. [...] > > In order to make multilingual support as painless as possible, it is > > proposed that all HTTP servers for multilingual documents *should* be > > able to convert documents from the local character set encoding to > > UCS-2, UTF-8, and UTF-7 (16, 8 and 7 bit encodings of Unicode). It is > > also proposed that all HTTP clients *should* be able to parse UCS-2, > > UTF-8 and UTF-7. It is *recommended* that browsers allow the data to be > > saved as UTF-7, UTF-8, or UCS-2 (similar to the current ftp > > interface). If possible, a browser *should* also allow the data to be > > saved in the local character set encoding, but that might not always > > be possible (for example, saving a document containing Arabic on an > > ASCII based system). Documents sent from servers would then use a > > content type of: > > > > Content-Type: text/...; charset=UNICODE-1-1-UTF-7 > > Content-Type: text/...; charset=UNICODE-1-1-UTF-8 > > Content-Type: text/...; charset=UNICODE-1-1-UCS-2 > > > > Though UTF-8 and UCS-2 will need some additional encoding applied to > > them in order to be strictly MIME compliant. An alternative is to use > > an application/* type specifier instead. > > But http isn't strictly MIME compliant. In particular, the full 8 bit > nature of UTF8 fits in well with http. While UTF7 makes sense in the > MIME mail world of corrosive transport mechanisms, it is not needed > in http. For simiplicity I'd recommend a really limited set of > allowed encodings: ISO-8859-1 and UTF8. I'd suggest that (to interoperate well with the MIME software community) HTTP should allow the same ISO-8859-X codes as MIME (as I recall, X=1 to 9). Still, advocating something like UTF8 as a preferred transport for stuff that would otherwise be in multiple character sets seems reasonable. (I get the impression that Unicode is not universally loved ... would anyone care to comment more on the down side?) At what point do char set issues get handed off to the HTML standard? It seems like some of the alternative glyph rendering issues could get mixed in with things like font changes and presentation control (which are possible, though controversial, issues for HTML). Of course text/html is not the only text/* type we might have to transport....
Received on Monday, 26 December 1994 18:21:13 UTC