- From: Klaus Weide <kweide@tezcat.com>
- Date: Sat, 7 Dec 1996 14:51:53 -0600 (CST)
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, www-international@w3.org
> # If there is a need for a client to express "I can understand UTF-8, > # but can only display some of the 10646 characters: ..." - and I > # definitely think there is such a need - I don not see a way to implement > # this cleanly. On Sat, 7 Dec 1996, Larry Masinter wrote: > I think this kind of communication is along the same lines as: "I can > implement all of HTML 3.5 tables, except I don't know anything about > the 'border' parameter". > > That is, there may be a need to communicate special subset > capabilities, but usually those limitations are transient and too > fine-grained to actually matter in real communication. That does not look like a fair comparison. Whatever HTML 3.5 tables are, understanding the border parameter looks like a minor thing, as you say. But not being able to say what characters I can understand would matter a lot in real communication. Saying "I can understand 10646" or "I can understand UTF-8" practically just means that I can decode that character encoding. That is on the same level as saying "I can understand 8-bit character sets" without specifying which. If anything more detailed is too fine-grained to really matter then I don't see why anybody should currently bother to use Accept-Charset: ISO-8859-2 etc. > In general, in the web, we've avoided catering to fine-grained > differentiation of client capabilities. Yes, you can say "I speak > postscript" or not, but there's no good way to say "I can take > postscript files but don't give me any that won't look good on little > pieces of paper". But whether that text is readable for me or appears as complete garbage (because I couldn't tell the server about my character repertoire) is a bit more significant than whether something looks good or bad. If I move from sending (say) Accept-Charset: iso-8859-3 to Accept-Charset: utf-8 (because my browser now understands that character encoding), then I *lose* the capability to express what is more important for the human user: what characters I can actually see. And the overloading of Accept-Language with character repertoire meaning seems to show that there is a perceived need to express character repertoire capabilities. With the given structure of the MIME "charset" parameter (and therefore the Accept-Charset header), the logical thing to at least preserve what currently can be expressed w.r.t. repertoire would be to register lots of additional charsets: we'd then have ISO-10646-Unicode-Latin2, ISO-10646-Unicode-Latin3, ISO-10646-Unicode-Latin4, and so on. Well I can see why that isn't very inviting, looks like a big can of worms... What I cannot understand is how the loss of existing expressive capability for negotiation (of something *essential*) can be seen as a step forward. > There _is_ a proposal for allowing profiles of capabilities to be > expressed and negotiated, and the proposal is elaborated in internet > drafts: > draft-holtman-http-negotiation-04.txt > draft-ietf-http-feature-reg-00.txt > and related topics in: > draft-mutz-http-attributes-02.txt > draft-goland-http-headers-00.txt > from your nearby internet drafts directory. Perhaps 'support for > particular subsets of ISO-10646' might fit into this category. I am rather thinking about 'need for..' than 'support for..'. Maybe it is the most practical way. But no mechanism is in place yet, while overloading the language header (and associated inventiveness with new HTML tags) can be done now... Come to think of it, putting 'particular subsets of ISO-10646' under feature tag registration wouldn't work. Other protocols like mail presumably will also need a way to say "this is Latin42 characters encoded with UTF-8'. I don't think that a HTTP/HTML/Web specific feature tag registration can take over the IANA charset registry's function. BTW It seems those drafts specifically exclude "MIME type, charset, and language" from the new feature tags. Probably because they are too essential. For all practical purposes Hebrew characters encoded as UTF-8 (or raw 16-bit) *is* a different charset fro Greek characters encoded the same way. Klaus
Received on Saturday, 7 December 1996 12:58:15 UTC