W3C home > Mailing lists > Public > www-archive@w3.org > October 2007

RE: HTML 5 defaults to Windows-1252, where charmod requiresUTF-8/UTF-16

From: Richard Ishida <ishida@w3.org>
Date: Mon, 29 Oct 2007 17:42:18 -0000
To: "'Dan Connolly'" <connolly@w3.org>
Cc: "'www-archive'" <www-archive@w3.org>, "'Chris Wilson'" <Chris.Wilson@microsoft.com>
Message-ID: <00c501c81a53$0d77c560$6601a8c0@rishida>

Hi Dan,

Please send questions like this to public-i18n-core list, so that the i18n
WG can reply.

It's not clear to me from a quick look that there's a conflict.  CharMod
says that you must define one or both of UTF-8 and UTF-16 as *a default*,
and HTML5 is defining minimum set of encodings that must be supported,
rather than a default (as I read it).  CharMod doesn't proscribe recogition
of other encodings.

I think the appropriate charmod criterion for the html5 text in section is http://www.w3.org/TR/charmod/#C026 "If the unique encoding
approach is not chosen, specifications MUST designate at least one of the
UTF-8 and UTF-16 encoding forms of Unicode as admissible character encodings
and SHOULD choose at least one of UTF-8 or UTF-16 as required encoding forms
(encoding forms that MUST be supported by implementations of the
specification)." - which I think section of html5 supports.

>From my reading, the 'defaults to win1252' bit comes only if the user
specifies that a page is in ISO latin1 - ie. Assume that people don't know
the difference between those two.  It's not a general default.  I don't see
where html5 specifies what to default to if the encoding is completely
unknown.  According to charmod, this is when you should choose utf-8 or
utf-16.  (There may be something about that later in html5.)

Does that make sense?


Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)


> -----Original Message-----
> From: Dan Connolly [mailto:connolly@w3.org] 
> Sent: 29 October 2007 17:22
> To: Richard Ishida
> Cc: www-archive; Chris Wilson
> Subject: HTML 5 defaults to Windows-1252, where charmod 
> requiresUTF-8/UTF-16
> Richard,
> These conflict:
> "C027   [S]  Specifications that require a default encoding 
> MUST define
> either UTF-8 or UTF-16 as the default, or both if they define 
> suitable means of distinguishing them."
>  -- http://www.w3.org/TR/charmod/#C027
> "User agents must at a minimum support the UTF-8 and 
> Windows-1252 encodings, but may support more." -- 
> Character encoding requirements http://www.w3.org/html/wg/html5/ 
> I don't think that aspect of the HTML 5 spec is going to 
> change; it's already ubiquitously deployed:
>  "Many web browsers treat the MIME charset ISO-8859-1 as 
> Windows-1252 "
> -- http://en.wikipedia.org/wiki/Windows-1252 
> Any suggestions on what to do about the conflict? It's not 
> clear to me why C027 is a MUST. Which WG(s) should we be talking to?
> p.s. note the cc to www-archive; i.e. feel free to 
> copy/cite/forward anywhere.
> --
> Dan Connolly, W3C http://www.w3.org/People/Connolly/
> gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Monday, 29 October 2007 17:39:50 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:33:16 UTC