- From: Richard Ishida <ishida@w3.org>
- Date: Mon, 29 Oct 2007 17:42:18 -0000
- To: "'Dan Connolly'" <connolly@w3.org>
- Cc: "'www-archive'" <www-archive@w3.org>, "'Chris Wilson'" <Chris.Wilson@microsoft.com>
Hi Dan, Please send questions like this to public-i18n-core list, so that the i18n WG can reply. It's not clear to me from a quick look that there's a conflict. CharMod says that you must define one or both of UTF-8 and UTF-16 as *a default*, and HTML5 is defining minimum set of encodings that must be supported, rather than a default (as I read it). CharMod doesn't proscribe recogition of other encodings. I think the appropriate charmod criterion for the html5 text in section 8.2.2.2 is http://www.w3.org/TR/charmod/#C026 "If the unique encoding approach is not chosen, specifications MUST designate at least one of the UTF-8 and UTF-16 encoding forms of Unicode as admissible character encodings and SHOULD choose at least one of UTF-8 or UTF-16 as required encoding forms (encoding forms that MUST be supported by implementations of the specification)." - which I think section 8.2.2.2 of html5 supports. >From my reading, the 'defaults to win1252' bit comes only if the user specifies that a page is in ISO latin1 - ie. Assume that people don't know the difference between those two. It's not a general default. I don't see where html5 specifies what to default to if the encoding is completely unknown. According to charmod, this is when you should choose utf-8 or utf-16. (There may be something about that later in html5.) Does that make sense? Cheers, RI ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/International/ http://rishida.net/blog/ http://rishida.net/ > -----Original Message----- > From: Dan Connolly [mailto:connolly@w3.org] > Sent: 29 October 2007 17:22 > To: Richard Ishida > Cc: www-archive; Chris Wilson > Subject: HTML 5 defaults to Windows-1252, where charmod > requiresUTF-8/UTF-16 > > Richard, > > These conflict: > > "C027 [S] Specifications that require a default encoding > MUST define > either UTF-8 or UTF-16 as the default, or both if they define > suitable means of distinguishing them." > -- http://www.w3.org/TR/charmod/#C027 > > "User agents must at a minimum support the UTF-8 and > Windows-1252 encodings, but may support more." -- 8.2.2.2. > Character encoding requirements http://www.w3.org/html/wg/html5/ > > I don't think that aspect of the HTML 5 spec is going to > change; it's already ubiquitously deployed: > > "Many web browsers treat the MIME charset ISO-8859-1 as > Windows-1252 " > -- http://en.wikipedia.org/wiki/Windows-1252 > > Any suggestions on what to do about the conflict? It's not > clear to me why C027 is a MUST. Which WG(s) should we be talking to? > > p.s. note the cc to www-archive; i.e. feel free to > copy/cite/forward anywhere. > > -- > Dan Connolly, W3C http://www.w3.org/People/Connolly/ > gpg D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E >
Received on Monday, 29 October 2007 17:39:50 UTC