Re: Publication of specifications as HTML5 from Leif Halvard Silli on 2011-08-19 (spec-prod@w3.org from July to September 2011)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Fri, 19 Aug 2011 20:11:56 +0200
To: Aryeh Gregor <ayg@aryeh.name>
Cc: Ian Jacobs <ij@w3.org>, David Carlisle <davidc@nag.co.uk>, Richard Ishida <ishida@w3.org>, Karl Dubost <karl+w3c@la-grange.net>, Doug Schepers <schepers@w3.org>, Spec Prod <spec-prod@w3.org>, Philippe Le Hegaret <plh@w3.org>
Message-ID: <20110819201156498064.45cb6cba@xn--mlform-iua.no>
Aryeh Gregor, Fri, 19 Aug 2011 11:42:13 -0400:
> On Thu, Aug 18, 2011 at 11:20 PM, Ian Jacobs <ij@w3.org> wrote:
>> I had understood "conforms to http://www.w3.org/TR/html-polyglot/"
>> 
>> For XML processors.
> 
> Polyglot is not targeted at XML processors.

+1 

>  The idea of a polyglot
> document is that the same file should work the same in a *browser*
> whether it's served as text/html or an XML MIME type.  In practice,
> however, this isn't useful, because all browsers support text/html, so
> there's no need to serve with two MIME types.

Virtually all browsers support application/xhtml+xml as well, as long 
as you use .html as the file suffix (and also do not use "Cool URIs" 
but offer the .html as part of the URI - then IE before version 9 
sniffs it as HTML.) Serving it that way means that you instead of 
ditching application/xhtml+xml can ditch text/html instead. Just for 
the record.

To say that 'application/xhtml+xml' in practise isn't needed in 
practise means that you rule out inline SVG and MathML for legacy 
browsers (though one may use JavaScript, to some degree and with some 
performance loss).

> If we're concerned about non-browser XML processors
> [...] just make a text/html-to-XML converter available. [...]

> The key difference here is that a polyglot document tries to be
> equivalent text/html and XML the the *same file*, *and* they try to
> produce the same DOM (or almost) when parsed either way.  This is
> actually very nontrivial, and it's not necessary if we only want to
> support XML processing.

What is nontrivial for whom, depends on 'whom'. Polyglot Markup defines 
the syntax, and if you follow it, it is trivial to reach Polyglot 
Markup's goals. Some of the polyglot markup rules can be seen as smart 
also from other angles - e.g. the preference for external javascript.

> On Fri, Aug 19, 2011 at 7:09 AM, David Carlisle <davidc@nag.co.uk> wrote:
>> What may (or may not?) be needed are content model restrictions on using
>> or not using new "html5" structural features. Could a normative version
>> of the spec use canvas for example?
  [...]
>  The goal of a specification is to be read and understood,
> after all.  As long as the markup used is such that it will be clearly
> and accurately understood by pretty much any CSS-supporting browser
> people are going to use -- say without JavaScript or plugins -- that
> should be okay.

Can't see that this rules out 'application/xhtml+xml' - on the contrary.

  [...]
> But all this is only realistically decidable on a case-by-case basis.
> It should just be a corollary of "specifications have to be clearly
> written".

There are definitions - at least in general - of how to write clearly. 
Some of those can be "if you choose to write ... then you should also 
includ ... ". One could have rules for how and when to use SVG, for 
instance.

>  I think it's quite a separate question from what formats we
> should allow to begin with.  Obviously W3C specs should be published
> in HTML+CSS+JS, not PDF or Flash or anything, nor using nonstandard
> extensions.

Till now, javascript support hasn't been necessary to read specs. Rules 
for how and what to use JS for might be in place.

>  But I don't see a reason to restrict the exact versions
> used, provided they're standard or being standardized and the features
> work in practice.
> 
> On Fri, Aug 19, 2011 at 7:25 AM, Richard Ishida <ishida@w3.org> wrote:
  [...]
> It's actually very hard to produce real polyglot documents
> automatically.  For instance, there is no markup that will produce a
> script tag with a single Text child that contains < or & that will
> work in both text/html and XML.  <script><</script> works in
> text/html, but is not XML.  <script>&lt;</script> works in XML, but
> produces a different DOM as text/html ("&lt;" is treated as four
> literal characters instead of one entity).  In practice you have to
> use hacks like <script>/*<![CDATA[*/</*]]>*/</script> that more or
> less work the same but don't actually produce the same DOM.  So we
> should not be talking about polyglot unless we *really* mean polyglot,
> rather than just "let's make a text/html-to-XML converter available".

CDATA is not permitted in polyglot markup. [*] '<' is also not 
permitted. The '&' is also not permitted, so <script>&lt;</script> 
naturally is unpermitted. Web browsers that support XHTML shows a fatal 
error if included, so it is easy to discover.

[*] The reason CDATA is not permitted in polyglot markup is because 
HTML5 forbids them every place except in foreign content: 
http://www.w3.org/TR/html5/syntax.html#cdata-sections There is a bug to 
clarify this: http://www.w3.org/Bugs/Public/show_bug.cgi?id=13604


>> [2] there are features of HTML5 that are not yet widely supported.  I think
>> that what's needed is a defined subset [...]
> 
> As noted, this is not specific to HTML5 -- it even applies to things
> that are in CSS2.1 and haven't changed since CSS2.

+1
  
> I don't think we
> can make a precise list, it should be more like guidelines whose
> interpretation can change over time.

If we, as a basis, requir polyglot markup, then some things, such as 
IE's need for an explicitly typed <body> element (and <head> too, I 
think) to not - in some situations - place new elements inside the 
<head>. Thus we don't need to go into that particular  subject w.r.t. 
IE, but can consentrate on more useful rules - or guidelines.
-- 
Leif Halvard Silli
Received on Friday, 19 August 2011 18:12:28 UTC