RE: Bug 7381 (default encoding selection)

Richard suggested:

> Otherwise, return an implementation-defined or user-specified
> default
> character encoding, with the confidence tentative. In controlled
> environments,
> the more comprehensive UTF-8 encoding is recommended. For the wider
> Web,
> the default may be set according to the
> expectations and predominant content encodings for a given
> demographic
> or audience. For example, windows-1252 is recommended as the
> default
> encoding
> for Western European language environments. Other encodings may
> also be
> used.
> For example, "windows-949" might be an appropriate default in a
> Korean
> language
> runtime environment.

I agree with this suggested text, although honestly, the "In controlled environments..." clause makes me shake my head. For me, dropping the UTF-8 recommendation sentence altogether makes more sense that shrouding it in mysterious circumstances.

> 
> [1] We could add to the end ", and UTF-8 would be an appropriate
> default for
> scripts in many developing regions."  I suggest this, not because I
> want to
> see utf-8 go for world wide web domination or because I see it as a
> global
> panacea, but because I think it helps for certain demographics or
> audience.
> The situation in these regions is often mired in competing
> encodings each
> with a non-majority user base, that impede general interoperability,
> and use
> of utf-8 tends to provide a way forward - not only by superceding
> other
> encoding schemes, but also typically by providing useful features
> that
> support the use of the script itself.  I just don't want it to
> sound as if
> you should try to find a local encoding for the default in every
> circumstance.

+1

> 
> [2] I think it may also be worthwhile noting that the default
> encoding may
> also be that explicitly set by users in some applications (eg.
> Firefox and IE allow you to change the default encoding).

I think that is what the phrase "user-specified" is intended to mean.

Based on the above, how about:

--
Otherwise, return an implementation-defined or user-specified default character encoding, with the confidence tentative. This default should be set according to user expectations and predominant content encodings for a given demographic or audience. For example, "windows-1252" is recommended as the default encoding for Western European language environments, while "windows-949" might be an appropriate default in a Korean language runtime environment. 

UTF-8 is sometimes an appropriate default, even when the document is known not to be UTF-8. This is especially the case for  languages or scripts that are less common on the Web (often from developing regions) because that encoding provides the most likely support for the content as compared to legacy or font-based encodings. 
--

Addison

Received on Thursday, 20 August 2009 14:57:04 UTC