RE: HTML5 Issue 11 (encoding detection): I18N WG response...

Hello Ian,

Thank you for the response. This reply, except where it uses the personal pronoun "I", is on behalf of the Internationalization Core WG. [1]

> > --
> > In controlled environments or in cases where the encoding of
> > documents
> > can be prescribed, the UTF-8 character encoding is recommended.
> > --
> Ok, I've changed "non-legacy" to something more like the above.

When will this appear in the published editor's copy so we can look at it?

> This seems to be two problems:
> - "Western demographics" not being very clear for implementors. In
> practice, I think implementors understand this pretty well, so I'm
> not convinced that's a problem.

I feel that the terminology is not very useful as written. It provides what appears to be normative guidance but conveys no useful information about what a Western demographic might be. While the major browser implementers probably understand what you're getting at, future readers of this text must deal with this and their understanding may or may not match current implementer's abilities and understanding. 

> - The slippery slope of needing to define this for all demographic.

We disagree. The slippery slope isn't so much the problem as the fact that "demographic" is the wrong way to address it. Furthermore, there is no need for HTML5 to busy itself defining these. There are many other places where choices are left up to the implementer. This is no different. The normative text here exists to permit these choices.

> I
> would actually like to include details for other major demographies,
> but I
> don't think there's a slippery slope here, given that in the years
> of this
> text being present, we have not added requirements for other
> demographies.

In my opinion, implementers are permitted to do what they already do and assign specific default encodings to specific browser configurations or localization, thus, there is little pressure to improve the text. Longevity of the text in the draft should not, however, be seen as an impediment to its improvement.

> > audience. Browsers in the main do the right thing here, keying
> off
> > system locale or browser localization.
> I've changed "recommended" to "suggested".

See my last comments here. We don't think this change will be sufficient.

> > If browsers don't do full chardet, they may still get some
> > utility by including the UTF-8 sniff. I'll dig up an appropriate
> > reference if you prefer.
> If you have a reference for this, that would be preferable, yes.
> Thanks.

Martin provided a couple, which were, in fact, the ones I had in mind.

> > > >

> > > > "Clarify default encoding wording and add some examples for
> non-
> > > > latin locales."
> > >
> > > Thanks. I will get to these in due course.
> >
> > Thanks. Please let I18N WG know if we can assist you with this. I
> think
> > that the text suggested further down the thread marks a useful
> > improvement both on the existing text and on the original
> proposal.
> This bug is currently awaiting elaboration from the reporter.

This email thread contains, in the opinion of the Internationalization Core WG, the necessary elaboration. We think you should adopt verbatim either the text Richard proposed in:

Or the slightly modified version I proposed in:

... both of which reference this bug.

Please let us know your thoughts on how to resolve this bug.

Regards (for I18N),



Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization WG

Internationalization is not a feature.
It is an architecture.

Received on Thursday, 8 October 2009 02:48:11 UTC