- From: Dan Connolly <connolly@w3.org>
- Date: Mon, 29 Oct 2007 12:51:17 -0500
- To: Richard Ishida <ishida@w3.org>, public-i18n-core@w3.org
- Cc: 'www-archive' <www-archive@w3.org>, 'Chris Wilson' <Chris.Wilson@microsoft.com>
On Mon, 2007-10-29 at 17:42 +0000, Richard Ishida wrote: > Hi Dan, > > Please send questions like this to public-i18n-core list, so that the i18n > WG can reply. OK. done. > It's not clear to me from a quick look that there's a conflict. CharMod > says that you must define one or both of UTF-8 and UTF-16 as *a default*, > and HTML5 is defining minimum set of encodings that must be supported, > rather than a default (as I read it). CharMod doesn't proscribe recogition > of other encodings. > > I think the appropriate charmod criterion for the html5 text in section > 8.2.2.2 is http://www.w3.org/TR/charmod/#C026 "If the unique encoding > approach is not chosen, specifications MUST designate at least one of the > UTF-8 and UTF-16 encoding forms of Unicode as admissible character encodings > and SHOULD choose at least one of UTF-8 or UTF-16 as required encoding forms > (encoding forms that MUST be supported by implementations of the > specification)." - which I think section 8.2.2.2 of html5 supports. > > >From my reading, the 'defaults to win1252' bit comes only if the user > specifies that a page is in ISO latin1 - ie. Assume that people don't know > the difference between those two. It's not a general default. I don't see > where html5 specifies what to default to if the encoding is completely > unknown. I suppose it's in 8.2.2.1. Determining the character encoding: "Otherwise, return an implementation-defined or user-specified default character encoding, with the confidence tentative. Due to its use in legacy content, windows-1252 is recommended as a default in predominantly Western demographics. In non-legacy environments, the more comprehensive UTF-8 encoding is recommended instead. Since these encodings can in many cases be distinguished by inspection, a user agent may heuristically decide which to use as a default." > According to charmod, this is when you should choose utf-8 or > utf-16. (There may be something about that later in html5.) > > Does that make sense? I suppose so; I'm happy with any conclusion that says I don't need to do more work. ;-) > Cheers, > RI > > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/International/ > http://rishida.net/blog/ > http://rishida.net/ > > > > > > -----Original Message----- > > From: Dan Connolly [mailto:connolly@w3.org] > > Sent: 29 October 2007 17:22 > > To: Richard Ishida > > Cc: www-archive; Chris Wilson > > Subject: HTML 5 defaults to Windows-1252, where charmod > > requiresUTF-8/UTF-16 > > > > Richard, > > > > These conflict: > > > > "C027 [S] Specifications that require a default encoding > > MUST define > > either UTF-8 or UTF-16 as the default, or both if they define > > suitable means of distinguishing them." > > -- http://www.w3.org/TR/charmod/#C027 > > > > "User agents must at a minimum support the UTF-8 and > > Windows-1252 encodings, but may support more." -- 8.2.2.2. > > Character encoding requirements http://www.w3.org/html/wg/html5/ > > > > I don't think that aspect of the HTML 5 spec is going to > > change; it's already ubiquitously deployed: > > > > "Many web browsers treat the MIME charset ISO-8859-1 as > > Windows-1252 " > > -- http://en.wikipedia.org/wiki/Windows-1252 > > > > Any suggestions on what to do about the conflict? It's not > > clear to me why C027 is a MUST. Which WG(s) should we be talking to? > > > > p.s. note the cc to www-archive; i.e. feel free to > > copy/cite/forward anywhere. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ gpg D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
Received on Monday, 29 October 2007 17:50:08 UTC