- From: Aryeh Gregor <ayg@aryeh.name>
- Date: Fri, 19 Aug 2011 11:42:13 -0400
- To: Ian Jacobs <ij@w3.org>, David Carlisle <davidc@nag.co.uk>, Richard Ishida <ishida@w3.org>
- Cc: Karl Dubost <karl+w3c@la-grange.net>, Doug Schepers <schepers@w3.org>, Spec Prod <spec-prod@w3.org>, Philippe Le Hegaret <plh@w3.org>
On Thu, Aug 18, 2011 at 11:20 PM, Ian Jacobs <ij@w3.org> wrote: > I had understood "conforms to http://www.w3.org/TR/html-polyglot/" > > For XML processors. Polyglot is not targeted at XML processors. The idea of a polyglot document is that the same file should work the same in a *browser* whether it's served as text/html or an XML MIME type. In practice, however, this isn't useful, because all browsers support text/html, so there's no need to serve with two MIME types. If we're concerned about non-browser XML processors, we shouldn't need polyglot. All we should need is to make an XML serialization of the spec available, or just make a text/html-to-XML converter available. Then existing XML toolchains could process the document by just adding one extra conversion step. If you have html5lib installed, a text/html-to-XML converter should take <10 lines to write and take a negligible amount of time to run, less than fetching the file from the network. The key difference here is that a polyglot document tries to be equivalent text/html and XML the the *same file*, *and* they try to produce the same DOM (or almost) when parsed either way. This is actually very nontrivial, and it's not necessary if we only want to support XML processing. On Fri, Aug 19, 2011 at 7:09 AM, David Carlisle <davidc@nag.co.uk> wrote: > What may (or may not?) be needed are content model restrictions on using > or not using new "html5" structural features. Could a normative version > of the spec use canvas for example? This question is not specific to the HTML markup. A spec could also conceivably use CSS or JavaScript that's not supported by all browsers, like localStorage or such. It could even use features that are in RECs but aren't universally supported. For instance, you could write a page that works perfectly in any browser that supports HTML 4.01 and CSS 2.1, but which is totally unreadable in IE6 and 7. That's about 13% of browsers by market share that can't read the page (using Wikimedia's statistics). Likewise, HTML5 uses some Unicode characters that display as boxes on my computer -- that doesn't break any standard, but it's arguably a bad idea anyway, and certainly would be if it were confusing. I think we have to be pragmatic here and judge on a case-by-case basis, based on real-world UA behavior rather than nominal maturity levels. The goal of a specification is to be read and understood, after all. As long as the markup used is such that it will be clearly and accurately understood by pretty much any CSS-supporting browser people are going to use -- say without JavaScript or plugins -- that should be okay. So if the spec author wants to include an example, which is clearly marked as an example, which uses <canvas> and says "If your browser supports <canvas>, you'll see a smiley face here:", such that if the browser doesn't support <canvas> it instead displays fallback text like "Your browser does not support canvas :(", then I think that's not a problem. Depending on <canvas> (or any other JS) for normative text is obviously a non-starter, and also a bad idea if it's not really clear what's happening in non-supporting browsers. But all this is only realistically decidable on a case-by-case basis. It should just be a corollary of "specifications have to be clearly written". I think it's quite a separate question from what formats we should allow to begin with. Obviously W3C specs should be published in HTML+CSS+JS, not PDF or Flash or anything, nor using nonstandard extensions. But I don't see a reason to restrict the exact versions used, provided they're standard or being standardized and the features work in practice. On Fri, Aug 19, 2011 at 7:25 AM, Richard Ishida <ishida@w3.org> wrote: > [1] there are additional rules for polyglot documents to ensure that the > document works as XML and HTML (for example, no XML declaration allowed, > therefore encoding can only be utf-8 (or utf-16 but that was excluded from > polyglot)). So it's not just xml well-formedness. Having said that, I don't > think there are many additional rules to worry about. That's what the > polyglot spec describes: http://www.w3.org/TR/html-polyglot/ It's actually very hard to produce real polyglot documents automatically. For instance, there is no markup that will produce a script tag with a single Text child that contains < or & that will work in both text/html and XML. <script><</script> works in text/html, but is not XML. <script><</script> works in XML, but produces a different DOM as text/html ("<" is treated as four literal characters instead of one entity). In practice you have to use hacks like <script>/*<![CDATA[*/</*]]>*/</script> that more or less work the same but don't actually produce the same DOM. So we should not be talking about polyglot unless we *really* mean polyglot, rather than just "let's make a text/html-to-XML converter available". > [2] there are features of HTML5 that are not yet widely supported. I think > that what's needed is a defined subset of HTML5 for editors to use that > reflects what is currently supported on major browsers. That subset should > imo be revised as soon as new www.orfeatures become supported by major > browsers, eg. the dir=auto value will hopefully be supported soon, but it > isn't yet. It also assumes a decision that we are happy that people may > struggle with 'non-major' browsers that may not yet support html5 features, > and may have to view with a different browser. It also requires defining > what consitutes a 'major' browser. As noted, this is not specific to HTML5 -- it even applies to things that are in CSS2.1 and haven't changed since CSS2. I don't think we can make a precise list, it should be more like guidelines whose interpretation can change over time.
Received on Friday, 19 August 2011 15:43:06 UTC