- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Sat, 24 Apr 2010 21:09:58 +0200
- To: Sam Ruby <rubys@intertwingly.net>
- Cc: Eliot Graff <eliotgra@microsoft.com>, Adrian Bateman <adrianba@microsoft.com>, "public-html@w3.org" <public-html@w3.org>, "tag@w3.org" <tag@w3.org>, Tony Ross <tross@microsoft.com>, Paul Cotton <Paul.Cotton@microsoft.com>, "mjs@apple.com" <mjs@apple.com>, "plh@w3.org" <plh@w3.org>
Sam Ruby, Wed, 21 Apr 2010 19:14:16 -0400: > On 04/21/2010 06:15 PM, Eliot Graff wrote: >> Today, I uploaded an EARLY draft version of a polyglot spec, >> "HTML/XHTML Compatibility Authoring Guidelines." [1] > > A few QUICK comments: > >> If a polyglot document uses an encoding other than UTF8 or UTF16 > > UTF-16 is not valid for HTML5. I would recommend being more > prescriptive: simply recomment (or even require) utf-8 as it is the > only encoding guaranteed to be supported by all HTML and XML parsers. Regarding the META element. Draft says: ]] You SHOULD use the HTML meta tag to specify [[ character and coding in the document. Depending on what kind of specification this is gonna be ..., then the META should be a MUST. For round tripping reasons. E.g. take an accidental page from your blog, Sam: http://intertwingly.net/blog/2010/04/22/Restoring-floatflt-sty It doesn't use META. And, alas, despite the correct MIME type, some UAs/tools - even XHTML compatible ones (such as Webkit based iCab, but not Safari) - save your page to a disk with '.html' as suffix. (Opera goes "overbaord" and save as .xml). Regardless how it happens, when on disk as .html, then tools/UAs *may* default the locale encoding, as specced in HTML5. Problem e.g. seen in SeaMonkey's Composer and - now and then - in its cousin, KompoZer. If both UTF-8/UTF-16 *and* the META charset element was required, then there would seldom be encoding problems. But on the other side: [[ 4. Namespaces The following guidelines apply to namespaces used in polyglot documents. * The <html> element must have the namespace declaration xmlns="http://www.w3.org/1999/xhtml". [etc] [[ Firstly, a nit: Why say "guidelines", and then subsequently say "MUST have namespace declaration"? Secondly: Can we expect *benefits* inside text/html from using these namespaces? *If* a particular namespace, or something else, counted as a "polyglot document identifier", then to not treat it as an UTF-8 or UTF-16 file, could be considered as an error. Such a flag could also be used to prevent another common problem today: Tools and UAs which "normalize" XHTML syntax into HTML4 compatible syntax. Real world experience: Gecko based SeaMonkey's Composer reads Sam's blog as XHTML, but converts the syntax to HTML4 syntax *and* insert a META charset - without "/>"- with the correct encoding into the document. Same thing happens when opening a saved version of Sam's page, regardless of .xhtml or .html suffix. Another tool, Gecko based KompoZer opens the online version of Sam's page fine. And saves it as .xhtml (well, honestly, it is quite ad lib with what it does w.r.t. suffix). Subsequently it refuses to re-open the document, because it converted it to HTML4 syntax - it simply prompts an alert saying "This is not a HTML document". Despite that its preferences are set to retain the source code. DOCTYPE gotcha in KompoZer: For a 'file.html' with XHTML1 doctype, then KompoZer does NOT "normalize" "/>" to ">". But if the MIME type is <!DOCTYPE html>, then it *does* do that. (SeaMonkey composer does it regardless.) Such silent conversion from XHTML syntax to HTML4 syntax is a common problem. I have also had it in W3's own Amaya, occasionally. Though Amaya has tools for converting between syntaxes, so it is much less of a problem there. To put the above another way: We are looking to create a spec which requires XHTML tools to produce "Appendix 5" compatible XHTML. Effectively, XHTML tools must learn a new dialects of XHTML. But could we also flag these files in such a way that even text/HTML tools are *required* to not "normalize" the code of such files to HTML4 compatible syntax? I.E. could we require text/html tools to know two dialects of HTML syntax? So, what do we need? A new DOCTYPE which requires text/html user agents, not to save well formed XHTML, but to not "normalize" the syntax into HTML4-ish HTML? Or can the, the XHTML namespace talisman serve this purpose? Or must we simply give up? The "Appendix 5" spec could emphasize that it is an error to save application/xhtml+xml served pages with the file suffix .xhtml, no? But on the other side, if it is a polyglot spec, why should it require that pages are saved the one way or the other? -- leif halvard silli
Received on Saturday, 24 April 2010 19:10:36 UTC