- From: Sam Ruby <rubys@intertwingly.net>
- Date: Thu, 15 Jul 2010 14:42:51 -0400
- To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- CC: Anne van Kesteren <annevk@opera.com>, Richard Ishida <ishida@w3.org>, public-html@w3.org
On 07/15/2010 02:20 PM, Leif Halvard Silli wrote: > Anne van Kesteren, Thu, 15 Jul 2010 19:53:50 +0200: > >>> Do you agree? >> >> Not entirely, but mostly. > > Maciej, in the past, once treated as similar comment (about an > accessibility topic) as un-collegial. (Before he became co-chair, I > gather.) Full explanation and openness is appreciated. Restoring the original question: > UTF-16 encoded XML documents, on the other hand, must start with a > BOM, see http://www.w3.org/TR/REC-xml/#charencoding When the doc is > treated as XML, however, the meta element is ignored. > > Do you agree? I disagree that UTF-16 encoded XML document must start with a BOM. See: http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info My personal experience (which now may be dated) is that there are a number of XML parsers that choke in the presence of a BOM. But even if we ignore such, there still are quite a few ways to go given this set of information. It turns out that <meta charset="utf-16"/> will always be ignored but the content processed correctly if the content is correctly encoded as utf-16. I gather that Richard would prefer that such elements not be treated as conformance errors, whereas Ian would prefer that such elements be treated as conformance errors. We could also go a different way entirely, and say that polyglot documents are a subset of both HTML5 and XHTML5, and the subset that we select is only utf-8. I mention this as this is my personal recommendation on the matter, but I can live either of the other two alternatives mentioned above. - Sam Ruby
Received on Thursday, 15 July 2010 18:43:25 UTC