Re: The non-polyglot elephant in the room

Noah Mendelsohn <nrm@arcanedomain.com>, 2013-01-21 23:15 -0500:

> On 1/21/2013 9:47 AM, Michael[tm] Smith wrote:
> >So people can already determine that with the validator just by manually
> >running their documents through it twice: once with the HTML option
> >selected, and then again with the XHTML option selected.
> 
> Right, but I think polyglot is a bit more limited than the intersection: I
> believe that the intention with polyglot is to avoid constructs that are
> valid per each spec separately, but that are interpreted incompatibly (e.g.
> for purposes of DOM building and scripting).

True, agreed. But the fact that such constructs exist is exactly why it's
a bad idea to try to serve an XML document as text/html to begin with; it's
error-prone and very easy to get wrong. And I don't think in practice the
existence of the Polyglot spec is going to help much materially in ensuring
that people get it right on any kind of scale. Not any more than Appendix C
actually did much to help things on any kind of real scale.

> I (personally and with TAG hat on) am in favor of publishing the polyglot
> spec, but I doubt that effective validation can be achieved with just
> running the two validators as they are.

I agree that it's not going to achieved as far as complete conformance to
the Polyglot spec goes. So it gets back to the question of effort needed to
implement full conformance for it in a validator, and effort needed over
the long term to support that, and whether the benefits of doing that are
worth all the effort.

I think personally it's not worth the effort. And Henri's made it very
clear that he doesn't think it's worth the effort. So unless somebody else
comes along and takes the time to make a validator that does actually
implement a fully conformant Polyglot-validation option, we are not likely
to have validator that provides it.

A markup specification that doesn't have validation support is a markup
specification that not very many people are going to be able to conform to
successfully. So I think the lack of a high probability of Polyglot
validation support coming along is a strong argument against making the
Polyglot specification normative. (I think it's even a strong argument
against the specification existing at all in any form in the HTML WG, but
that's not an option being discussed at this point.)

> FWIW: I think there is a non-trivial and interesting pile of software that
> consumes XML and that is unlikely to be modified to use an HTML5 parser.

Sure, of course. Clearly there are plenty of uses of XML that have nothing
at all to with consuming HTML, and I think that tools that nobody actually
uses to process HTML don't need the additional cost and complexity of
having an HTML parser at all in any form.

> I think it's reasonable to set down some guidelines for authors pointing
> out the subset of HTML5 that's likely to be interpreted appropriately as
> XML and HTML.

I think it might be reasonable if there's much likelihood that authors are
going to be able to actually do that in practice without a significant
number of them running into problems. I think enough of them will in fact
run into problems doing it -- with or without the Polyglot spec -- that the
W3C should instead be promoting a best-practice of telling authors to not
try to do it at all, instead of advocating for a specification that
implicitly condones it as a good idea.

> Having a validator for that subset would be nice, but seems to me not
> essential to justifying the polyglot spec.

See my earlier comments above; I think in fact that lack of validator
support for it argues very strongly against the existence and normativity
of a specification for it.

> If I were, say, in a corporation and doing a project that required our HTML
> content to be processed by existing XML tools that aren't easily modified
> with HTML5 parsers, then having a polyglot spec to point to would be very
> helpful.

That's at least a couple of pretty big Ifs. And I certainly don't think the
HTML WG and the TAG should be optimizing its decisions for such a scenario.

  --Mike

-- 
Michael[tm] Smith http://people.w3.org/mike

Received on Tuesday, 22 January 2013 05:48:34 UTC