Re: NU’s polyglot possibilities (Was: The non-polyglot elephant in the room) from Alex Russell on 2013-01-24 (public-html@w3.org from January 2013)

From: Alex Russell <slightlyoff@google.com>
Date: Thu, 24 Jan 2013 17:14:36 -0500
To: "Michael[tm] Smith" <mike@w3.org>
Cc: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <CANr5HFXykkDkwNBvQmioGN3+-ABdR-ZsD3h_LWtd0ok6hN-MMQ@mail.gmail.com>
I find myself asking (without an obvious answer): who benefits from the
creation of polyglot documents?

If it's a closed ecosystem in which it's clear that all documents are XML
(but which might be sent to the "outside" as HTML), then I don't understand
why that ecosystem doesn't protect its borders by transforming HTML
documents (via an HTML parser->DOM->XML serialization) to XML.

Other possible users/producers seem even less compelling: if there's an
open ecosystem of documents that admit both HTML and XML, then it's always
going to be necessary for consuming software to support HTML parsing (and
likely also XML parsing). If it's a world of HTML consumers that would like
to access XML documents...well, just publish as (legacy) XHTML, no?

What am I missing? Under what conditions can the expectations of producers
and consumers of polyglot documents be simplified by the addition of
polyglot markup to their existing world/toolchain?


On Thu, Jan 24, 2013 at 4:57 AM, Michael[tm] Smith <mike@w3.org> wrote:

> Leif Halvard Silli <xn--mlform-iua@målform.no>, 2013-01-24 01:23 +0100:
>
> > Michael[tm] Smith, Mon, 21 Jan 2013 23:47:40 +0900:
> > > In the simplest implementation, the validator would need to
> > > automatically parse and validate the document twice
> >
> > 1 Could you do that? Just guide the user through two steps:
> >   HTML-validation + XHTML-validation?
>
> Of course doable. But I think it should be clear from my previous messages
> that regardless of how feasible it is, I'm not interested in implementing
> it. I don't want to put time into adding a feature that's intended to help
> users more easily create conforming Polyglot documents, because I don't
> think it's a good idea to encourage authors to create Polyglot documents.
>
> >   The second step could also
> >   produce a comparison of the DOM produced by the two steps.
>
> That would require the validator to construct a DOM from the document.
> Twice. The validator by design currently doesn't do any DOM construction at
> all. It does streaming processing of documents, using SAX events.
>
> Anyway, with respect, I hope you can understand that I'm not very
> interested in continuing a discussion of hypothetical functional details
> for a feature that I'm not planning to ever implement.
>
> > 2 But if the author uses a good, XHTML5-aware authoring tool that
> >   keeps the code well-formed, then a *single* validation as
> >   text/html should already bring you quite far.
>
> True I guess, if you're actually serving the document as text/html.
>
> But really what would get you even farther if you're using XML tools to
> create your documents is to not try to check them as text/html at all but
> instead serve them with an XML mime type, in which case the validator will
> parse them as XML instead of text/html, and everything will work fine.
>
> Anyway, yeah, if somebody is manually using XML tools to create their
> documents then I would think they'd already know whether they're
> well-formed, and they'd not need to use the validator to tell them whether
> they're well-formed or not. But of course a lot of documents on the Web are
> not created manually that way but instead dynamically generated out of a
> CMS, and many CMSes that are capable of serving up XML don't always get it
> right and can produce non-well-formed XML.
>
> All that said, I don't know why anybody who's serving a document as
> text/html would normally care much, at the point where it's being served
> (as opposed to the point where it's being created and preprocessed or
> whatever), whether it's XML-well-formed or not.
>
> > 3 Finally, one very simple thing: polyglot dummy code! The NU
> >   validator’s Text Field contains a HTML5 dummy that validates,
> >   but only as HTML, since the namespace isn't declared. Bug
> >   20712 proposes to add a dummy for the XHTML5 presets as well.[1]
> > [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=20712
>
> Yeah, I suppose it's worth having the dummy document include the namespace
> declaration if you've selected one of the XHTML presets. I'll get around to
> adding it at some point, if Henri doesn't first.
>
> >   Such a dummy no doubt serves as a teachable moment for many. And
> >   as long as you just add the namespace and otherwise keep the
> >   current dummy document intact, it would also, without banging
> >   it into anyone’s head, be a polyglot example.
>
> True that simple document would be a conforming polyglot instance, but I
> doubt most users would realize it as such, or care. The value of it would
> just be for the simple user convenience of not needing to manually add the
> namespace declaration in order to avoid the error message you get now.
>
>   --Mike
>
> --
> Michael[tm] Smith http://people.w3.org/mike
>
>
Received on Thursday, 24 January 2013 22:15:39 UTC