- From: Koen Holtman <koen@win.tue.nl>
- Date: Sun, 8 Dec 1996 00:28:51 +0100 (MET)
- To: Klaus Weide <kweide@tezcat.com>
- Cc: masinter@parc.xerox.com, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, www-international@w3.org
Klaus Weide: > [...using feature negotiation to negotiate on UTF-8....] >Maybe it is the most practical way. But no mechanism is in place yet, >while overloading the language header (and associated inventiveness with >new HTML tags) can be done now... Overloading a HTTP header and adding HTML tags will take _much_ more time than waiting for feature negotiation to be in place. But skimming the UTF-8 specification, I gather that UTF-8 is an encoding mechanism, not a character set. HTTP offers the Accept-Encoding/Content-encoding headers to negotiate on this. Or does using Accept-Encoding only shift the problem to negotiating which part of UCS you can render? When we reviewed the Accept-* header definitions for HTTP/1.1 early this year, we did not discuss the particular problem of character sets which could only be partially rendered, as would often be the case with unicode stuff. It is certainly possible that HTTP/1.1 cannot solve this problem, and maybe HTTP/1.1 + feature negotation also can't solve it. However, in the http-wg, we are very reluctant to do things like overload the language header; it is felt that adding more special-purpose complexity will decrease the useful lifetime of the HTTP/1.x protocols. The feature negotiation framework exists to keep negotiation complexity out of the main protocol, so if the choice is between overloading headers and using feature negotiation, we will want to use feature negotiation, even if the feature tags look a bit strange. >Come to think of it, putting 'particular subsets of ISO-10646' under >feature tag registration wouldn't work. Other protocols like mail >presumably will also need a way to say "this is Latin42 characters >encoded with UTF-8'. Other protocols can use registered feature tags if they need to say the same things. HTTP borrowed media types from MIME mail, and MIME mail can borrow feature tags from HTTP. It has already been recognised that feature tags could be useful for other protocols (and for conditional HTML). > I don't think that a HTTP/HTML/Web specific >feature tag registration can take over the IANA charset registry's ^^^^^^^^^ >function. We are not aiming to take over any existing IANA registry. >BTW It seems those drafts specifically exclude "MIME type, charset, >and language" from the new feature tags. Probably because they are >too essential. I don't know what you mean by `too essential', but "MIME type, charset, and language" were excluded because we don't want to duplicate existing IANA registries. The registration draft does allow you to use feature tags to negotiate on (new) charset-type things _if_ these new things cannot be handled by the existing mechanisms. For all practical purposes Hebrew characters encoded >as UTF-8 (or raw 16-bit) *is* a different charset fro Greek characters >encoded the same way. So you could say: Content-Type: text/html;charset=<hebrew> Content-Encoding: utf-8 and if you have a mixed language document: Content-Type: text/html;charset=<hebrew>;charset=<latin-x> Content-Encoding: utf-8 On the other hand, using feature tags, you could say: Content-Type: text/html;charset=utf-8 Content-Features: utf-8-cs="<hebrew>" utf-8-cs="<latin-x>" > Klaus Koen.
Received on Saturday, 7 December 1996 15:36:10 UTC