Re: Accept-Charset support

Klaus Weide:
[...using feature negotiation to negotiate on UTF-8....]

>Maybe it is the most practical way.  But no mechanism is in place yet,
>while overloading the language header (and associated inventiveness with
>new HTML tags) can be done now... 

Overloading a HTTP header and adding HTML tags will take _much_ more time
than waiting for feature negotiation to be in place.

But skimming the UTF-8 specification, I gather that UTF-8 is an encoding
mechanism, not a character set.  HTTP offers the
Accept-Encoding/Content-encoding headers to negotiate on this.  Or does
using Accept-Encoding only shift the problem to negotiating which part
of UCS you can render?

When we reviewed the Accept-* header definitions for HTTP/1.1 early this
year, we did not discuss the particular problem of character sets which
could only be partially rendered, as would often be the case with unicode
stuff.  It is certainly possible that HTTP/1.1 cannot solve this problem,
and maybe HTTP/1.1 + feature negotation also can't solve it.

However, in the http-wg, we are very reluctant to do things like overload the
language header; it is felt that adding more special-purpose complexity will
decrease the useful lifetime of the HTTP/1.x protocols.  The feature
negotiation framework exists to keep negotiation complexity out of the main
protocol, so if the choice is between overloading headers and using feature
negotiation, we will want to use feature negotiation, even if the feature
tags look a bit strange.

>Come to think of it, putting 'particular subsets of ISO-10646' under
>feature tag registration wouldn't work.  Other protocols like mail
>presumably will also need a way to say "this is Latin42 characters
>encoded with UTF-8'.

Other protocols can use registered feature tags if they need to say the same
things.  HTTP borrowed media types from MIME mail, and MIME mail can borrow
feature tags from HTTP.  It has already been recognised that feature tags
could be useful for other protocols (and for conditional HTML).

>  I don't think that a HTTP/HTML/Web specific
>feature tag registration can take over the IANA charset registry's

We are not aiming to take over any existing IANA registry.

>BTW It seems those drafts specifically exclude "MIME type, charset, 
>and language" from the new feature tags.  Probably because they are
>too essential.

I don't know what you mean by `too essential', but "MIME type, charset, and
language" were excluded because we don't want to duplicate existing IANA
registries.  The registration draft does allow you to use feature tags to
negotiate on (new) charset-type things _if_ these new things cannot be
handled by the existing mechanisms.

  For all practical purposes Hebrew characters encoded
>as UTF-8 (or raw 16-bit) *is* a different charset fro Greek characters
>encoded the same way.

So you could say:

 Content-Type: text/html;charset=<hebrew>
 Content-Encoding: utf-8

and if you have a mixed language document:

 Content-Type: text/html;charset=<hebrew>;charset=<latin-x>
 Content-Encoding: utf-8

On the other hand, using feature tags, you could say:

 Content-Type: text/html;charset=utf-8
 Content-Features: utf-8-cs="<hebrew>" utf-8-cs="<latin-x>"

>  Klaus


Follow-Ups: References: