Re: NU’s polyglot possibilities (Was: The non-polyglot elephant in the room) from Michael[tm] Smith on 2013-01-24 (public-html@w3.org from January 2013)

From: Michael[tm] Smith <mike@w3.org>
Date: Thu, 24 Jan 2013 18:57:26 +0900
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Cc: public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <20130124095724.GH46651@sideshowbarker>

Leif Halvard Silli <xn--mlform-iua@målform.no>, 2013-01-24 01:23 +0100:

> Michael[tm] Smith, Mon, 21 Jan 2013 23:47:40 +0900:
> > In the simplest implementation, the validator would need to 
> > automatically parse and validate the document twice
> 
> 1 Could you do that? Just guide the user through two steps:
>   HTML-validation + XHTML-validation?

Of course doable. But I think it should be clear from my previous messages
that regardless of how feasible it is, I'm not interested in implementing
it. I don't want to put time into adding a feature that's intended to help
users more easily create conforming Polyglot documents, because I don't
think it's a good idea to encourage authors to create Polyglot documents.

>   The second step could also
>   produce a comparison of the DOM produced by the two steps.

That would require the validator to construct a DOM from the document.
Twice. The validator by design currently doesn't do any DOM construction at
all. It does streaming processing of documents, using SAX events.

Anyway, with respect, I hope you can understand that I'm not very
interested in continuing a discussion of hypothetical functional details
for a feature that I'm not planning to ever implement.

> 2 But if the author uses a good, XHTML5-aware authoring tool that
>   keeps the code well-formed, then a *single* validation as
>   text/html should already bring you quite far.

True I guess, if you're actually serving the document as text/html.

But really what would get you even farther if you're using XML tools to
create your documents is to not try to check them as text/html at all but
instead serve them with an XML mime type, in which case the validator will
parse them as XML instead of text/html, and everything will work fine.

Anyway, yeah, if somebody is manually using XML tools to create their
documents then I would think they'd already know whether they're
well-formed, and they'd not need to use the validator to tell them whether
they're well-formed or not. But of course a lot of documents on the Web are
not created manually that way but instead dynamically generated out of a
CMS, and many CMSes that are capable of serving up XML don't always get it
right and can produce non-well-formed XML.

All that said, I don't know why anybody who's serving a document as
text/html would normally care much, at the point where it's being served
(as opposed to the point where it's being created and preprocessed or
whatever), whether it's XML-well-formed or not.

> 3 Finally, one very simple thing: polyglot dummy code! The NU
>   validator’s Text Field contains a HTML5 dummy that validates,
>   but only as HTML, since the namespace isn't declared. Bug 
>   20712 proposes to add a dummy for the XHTML5 presets as well.[1]
> [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=20712

Yeah, I suppose it's worth having the dummy document include the namespace
declaration if you've selected one of the XHTML presets. I'll get around to
adding it at some point, if Henri doesn't first.

>   Such a dummy no doubt serves as a teachable moment for many. And
>   as long as you just add the namespace and otherwise keep the 
>   current dummy document intact, it would also, without banging
>   it into anyone’s head, be a polyglot example.

True that simple document would be a conforming polyglot instance, but I
doubt most users would realize it as such, or care. The value of it would
just be for the simple user convenience of not needing to manually add the
namespace declaration in order to avoid the error message you get now.

  --Mike

-- 
Michael[tm] Smith http://people.w3.org/mike

Received on Thursday, 24 January 2013 09:57:42 UTC