Re: NU’s polyglot possibilities (Was: The non-polyglot elephant in the room) from Leif Halvard Silli on 2013-01-25 (www-tag@w3.org from January 2013)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Fri, 25 Jan 2013 04:24:59 +0100
To: "Michael[tm] Smith" <mike@w3.org>
Cc: public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <20130125042459706758.a6797919@xn--mlform-iua.no>

And for XML tools including an HTML parser, what content-type? 
;-)"Michael[tm] Smith" 24.01.2013, 10:57:
>Leif Halvard Silli, 2013-01-24 01:23 +0100:
>> Michael[tm] Smith, Mon, 21 Jan 2013 23:47:40 +0900:

>> 1 Could you do that? Just guide the user through two steps:
>> HTML-validation + XHTML-validation?
>
> Of course doable. But  [… snip …] I don't think it's a good
> idea to encourage authors to create Polyglot documents.

May be the spec can simply advice about how to best use non-polyglot 
validators. In case: Hope you would be OK with that.

> Anyway, with respect, [ … ]

Was inspired by your reply + NU’s various abilities to cross-validate…

> But really what would get you even farther if you're
> using XML tools to create your documents is to not try to check
> them as text/html at all but instead serve them with an XML mime
> type, in which case the validator will parse them as XML instead
> of text/html, and everything will work fine.

And what content-type if the XML tool includes an HTML parser? ;-)

Back to polyglot markup validation:

1) Validating polyglot HTML5 as XHTML5 works fine - simply
   activate "Be lax about content-type" and select XML parser
   before running the validator: http://tinyurl.com/a95tvf8

   Due to the "lax" setting, the page will then be validated as
   XML even if the Content-Type is text/html.
   (Just a small, less imimportant issue
   <https://www.w3.org/Bugs/Public/show_bug.cgi?id=20765>.)

2) Validating polyglot XHTML5 as HTML5 by selecting XML parser 
   plus HTML5 preset should also have worked, but there is a 
   weird bug 20766 which sees the @lang attribute as invalid
   <https://www.w3.org/Bugs/Public/show_bug.cgi?id=20766>.
   When you fix that bug, then pretty good one-pass polyglot
   checking will be possible for XML documents as well ...

> Anyway, yeah, if somebody is manually using XML tools to create
> their documents then I would think they'd already know whether
> they're well-formed, [ ... ]

Many such authoring tools includes an XML parser.

> But of course a lot of documents on the Web are not created
> manually that way but instead dynamically generated out of a
> CMS, and many CMSes that are capable of serving up XML don't
> always get it right and can produce non-well-formed XML.

True. 

> All that said, I don't know why anybody who's serving a
> document as text/html would normally care much, at the point
> where it's being served (as opposed to the point where it's
> being created and preprocessed or whatever), whether it's
> XML-well-formed or not.

Whether it is likely that many would care? Many wouldn't care.

>> 3 Finally, one very simple thing: polyglot dummy code! The NU
>>   validator’s Text Field contains a HTML5 dummy that validates,
>>   but only as HTML, since the namespace isn't declared. Bug 
>>   20712 proposes to add a dummy for the XHTML5 presets as well.
>>   [1]
>>   [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=20712

>
> Yeah, I suppose it's worth having the dummy document include
> the namespace declaration if you've selected one of the XHTML
> presets. I'll get around to adding it at some point, if Henri
> doesn't first.

Cool!

>>   Such a dummy no doubt serves as a teachable moment [...]

> True that simple document would be a conforming polyglot
> instance, but I doubt most users would realize it as such, or
> care. The value of it would just be for the simple user
> convenience of not needing to manually add the namespace
> declaration in order to avoid the error message you get now.

Cleary I attribute more value to it than you then. That something is
fully implemented from end-to-end means that it becomes simpler to
graskp, I think.
-- 
leif halvard silli

Received on Friday, 25 January 2013 03:25:31 UTC