Re: Default charsets for text media types [i20]

At 02:04 08/03/27, Frank Ellermann wrote:
>
>Martin D$B—S(Bst wrote:

>>|    <META HTTP-EQUIV="Content-Type"
>>|     CONTENT="text/html; charset=ISO-2022-JP">
>>| 
>>|   This is not foolproof, but will work if the encoding
>>|   scheme is such that ASCII-valued octets stand for 
>>|   ASCII characters only at least until the META element
>>|   is parsed.
> 
>> [This is very, very widely used. As far as it's HTML,
>>  it's nothing HTTP should be concerned, but it is highly
>>  relevant for HTTP because it is dead straight against
>>  any default on the charset parameter in HTTP.]
>
>Wait a moment, it is dead straight against any default that
>is *NOT* ASCII, or rather against a default not containing
>ASCII as proper subset.  

I think we have to be careful what a HTTP default means.
What a US-ASCII default on the HTTP level means is essentially
that whenever I get something like:
Content-Type: text/foo
I should treat this exactly as if I got:
Content-Type: text/foo; charset=US-ASCII

Now the later means: This is US-ASCII, nothing less and nothing
more. Given that, the browser won't look inside the document
anymore for any additional information. But this is not what
happens in practice, and not what we want.

It could be that by default above, you meant: something
to go back to if *all* else fails. That would mean that
the "default" is only applied if there is no other
information anywhere available about the encoding.
If that's what we want to say, simply saying "default"
is definitely not good enough. And I doubt it is actually
what happens in practice, because if no information is
found, it's usually the browser menu setting (based
on the browser's user interface language or the user's
choice) that kicks in, before a final default has any
chance to be used.


>Arguably it also tells us that the "default" does not mean
>much for HTTP.  It is interesting for HTTP header fields.

If "default" doesn't mean much, then we shouldn't call
it default.

>For the text/* [i20] issue we might be free to pick ASCII
>instead of Latin-1 if that's better for MIME compatibility,
>especially for text/plain, naturally for text/xml, and no
>problem for text/html.

Many (including me) are advising against text/xml, because
at least according to the books, the US-ASCII default for
text/xml is supposed to be a real default, i.e. there is
no chance to have an internal encoding information work for
text/xml.


>The main problem I have with the "Latin-1 default" is that
>it blocks a future "UTF-8 default" (talking about HTTP/1.1)

There is no need for such a default. Practice already works
without a default.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Received on Thursday, 27 March 2008 03:07:55 UTC