RE: HTML 5 defaults to Windows-1252, where charmod requiresUTF-8/UTF-16

Hi Dan,

Please send questions like this to public-i18n-core list, so that the i18n
WG can reply.

It's not clear to me from a quick look that there's a conflict.  CharMod
says that you must define one or both of UTF-8 and UTF-16 as *a default*,
and HTML5 is defining minimum set of encodings that must be supported,
rather than a default (as I read it).  CharMod doesn't proscribe recogition
of other encodings.

I think the appropriate charmod criterion for the html5 text in section
8.2.2.2 is http://www.w3.org/TR/charmod/#C026 "If the unique encoding
approach is not chosen, specifications MUST designate at least one of the
UTF-8 and UTF-16 encoding forms of Unicode as admissible character encodings
and SHOULD choose at least one of UTF-8 or UTF-16 as required encoding forms
(encoding forms that MUST be supported by implementations of the
specification)." - which I think section 8.2.2.2 of html5 supports.

>From my reading, the 'defaults to win1252' bit comes only if the user
specifies that a page is in ISO latin1 - ie. Assume that people don't know
the difference between those two.  It's not a general default.  I don't see
where html5 specifies what to default to if the encoding is completely
unknown.  According to charmod, this is when you should choose utf-8 or
utf-16.  (There may be something about that later in html5.)

Does that make sense?

Cheers,
RI

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/International/
http://rishida.net/blog/
http://rishida.net/

 
 

> -----Original Message-----
> From: Dan Connolly [mailto:connolly@w3.org] 
> Sent: 29 October 2007 17:22
> To: Richard Ishida
> Cc: www-archive; Chris Wilson
> Subject: HTML 5 defaults to Windows-1252, where charmod 
> requiresUTF-8/UTF-16
> 
> Richard,
> 
> These conflict:
> 
> "C027   [S]  Specifications that require a default encoding 
> MUST define
> either UTF-8 or UTF-16 as the default, or both if they define 
> suitable means of distinguishing them."
>  -- http://www.w3.org/TR/charmod/#C027
> 
> "User agents must at a minimum support the UTF-8 and 
> Windows-1252 encodings, but may support more." -- 8.2.2.2. 
> Character encoding requirements http://www.w3.org/html/wg/html5/ 
> 
> I don't think that aspect of the HTML 5 spec is going to 
> change; it's already ubiquitously deployed:
> 
>  "Many web browsers treat the MIME charset ISO-8859-1 as 
> Windows-1252 "
> -- http://en.wikipedia.org/wiki/Windows-1252 
> 
> Any suggestions on what to do about the conflict? It's not 
> clear to me why C027 is a MUST. Which WG(s) should we be talking to?
> 
> p.s. note the cc to www-archive; i.e. feel free to 
> copy/cite/forward anywhere.
> 
> --
> Dan Connolly, W3C http://www.w3.org/People/Connolly/
> gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
> 

Received on Monday, 29 October 2007 17:39:50 UTC