- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 28 Jul 2010 20:11:02 +0300
- To: Henri Sivonen <hsivonen@iki.fi>
- Cc: public-html <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Henri Sivonen, Mon, 26 Jul 2010 11:29:59 +0300: > On Jul 23, 2010, at 01:32, Leif Halvard Silli wrote: > >> Hm. According to ... XML 1.0, fifth edition: snip >> Thus, inferring from the above quotations, it seems like any encoding >> is possible, provided one avoids the XML (encoding) declaration and >> instead relies on external encoding information, typically HTTP headers. >> >> Do you see any fallacy in this conclusion? > > The conclusion is correct, but it requires defining "polyglot" > broadly enough to include the charset parameter of the content type > as part of the polyglot data that doesn't vary. Both XML and HTML5 "includes" HTTP. Thus HTTP is polyglot. Next question: if one can specify any encoding via HTTP, why forbid any encoding inside <meta charset='*'/>? And then: Why allow any encoding inside <meta charset='*'/> but not allow the XML (encoding) declaration? > There's one catch though: The pure XML processing model doesn't treat > the original encoding of the document as part of the meaningful data > encoded by the document. Thus, if the document includes non-ASCII > characters in URL query strings, the URL resolves differently in pure > XML tooling and in HTML5-compliant UAs. However, if only valid > documents are considered, this isn't a problem, because non-ASCII in > query strings is already declared non-conforming if the encoding of > the document isn't UTF-8 or UTF-16. Thanks for pointing this out, Thus, in a way, non-UTF-8 and non-UTF-16 documents becomes a subset of Polyglot Markup - with its own rules. Btw, what about non-ASCII chars in query strings in UTF-32 encoding documents? Shouldn't that work? (UTF-32 is recommended against, but still permitted, in HTML5.) -- leif halvard silli
Received on Thursday, 29 July 2010 12:51:51 UTC