Re: NU’s polyglot possibilities (Was: The non-polyglot elephant in the room) from Leif Halvard Silli on 2013-01-26 (www-archive@w3.org from January 2013)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Sat, 26 Jan 2013 19:51:26 +0100
To: "Michael[tm] Smith" <mike@w3.org>
Cc: www-archive@w3.org
Message-ID: <20130126195126105665.056f4ed2@xn--mlform-iua.no>
Michael[tm] Smith, Sat, 26 Jan 2013 18:51:50 +0900:

> Moving this to www-archive because I think we've gotten off-topic already.

You deviated into "instead serve them with an XML mime type". I stayed 
on topic.

Context: 
http://lists.w3.org/Archives/Public/public-html/2013Jan/0147.html

>> @2013-01-25 04:24 +0100:
>> Back to polyglot markup validation:
>> 2) Validating polyglot XHTML5 as HTML5 by selecting XML parser 
>>    plus HTML5 preset should also have worked, but there is a 
>>    weird bug 20766 which sees the @lang attribute as invalid
>>    <https://www.w3.org/Bugs/Public/show_bug.cgi?id=20766>.
>>    When you fix that bug, then pretty good one-pass polyglot
>>    checking will be possible for XML documents as well ...
> 
> As far as I can tell from the examples in your comments for that bug, what
> you seem to be wanting to do is to have a document parsed as XML but then
> checked against the HTML5 schema instead of the XHTML5 schema.

I share the "honor" with you: When I read your "instead serve them with 
an XML mime type", I first read it as a comment on the best polyglot 
validation strategy, and started to test validating XHTML5 with a HTML5 
preset.

Meanwhile, there very idea that one can suppress the Content-Type stems 
from Validator NU itself - "Be lax about content-type".

> If so, the solution to that problem is: Don't do that.
>
> I don't think the validator is designed to necessarily do something
> completely sensible in that case.

I think it should definitely issue a warning in such cases: 
https://www.w3.org/Bugs/Public/show_bug.cgi?id=20783

> I guess this may not be clear from the
> validator UI, but I think the schema that's labeled "HTML5" in the
> validator UI is explicitly intended for validation with documents that have
> been parsed as text/html.

That the HTML5 presets are meant for text/html should be obvious from 
the preset names. 

> (Henri can correct me if I'm wrong.) Certainly
> from my understanding of the spec at least, that *should* be the intent.

When applying an HTML5 preset to a XHTML document, the only effect 
(apart from that bug) appears to be that some text/html-only elements 
(like <noscript/>) are allowed.  Else, it's like using the XHTMl5 
preset. For example, it doesn't despite that the HTML5 preset normally 
requires the DOCTYPE, it doesn't require the DOCTYPE when used with the 
XML parser, see <http://tinyurl.com/achz4xe>. As such, it isn't useful 
for anything.

But if instead NU would behaved more like it does for when it does 
XHTML 1.0 Strict DTD validation, then it could be a pretty polyglot 
validation method: Because, even if NU for XHTML1 uses the XML parser, 
it issues an error if it for example sees a prefixed element, such as 
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" /> (which the 
XHTML 1.0 DTD does not permit) - see http://tinyurl.com/9w93rvs.

> I can say at least that I'm certain about the intent in the schema that's
> labeled "XHTML5" in the validator UI. That certainly is not intended for
> use with text/html documents -- because if you choose the HTML5 parser for
> the parser option, the UI doesn't even allow you to choose any of the
> XHTML5 schema options; the validator grays them out and doesn't allow you
> to select them.

If one activates "Be lax about content-type", then NU allows you to 
validate "text/html" as if it was XML. So it is actually permitted, but 
with a warning. Validator nu has had that feature for as long as I can 
remember.

> So perhaps the fix for the bug you reported is that we need to do the same
> thing if you chose the XML parser; that is, we need to then gray out all
> the HTML5 schema options so that you can't select them and so won't run
> into the unexpected condition you're running into now, which the HTML spec
> doesn't actually define any behavior for.

Now that I have thought about it more and checked more about how NU 
behaves, I think there are two options: EITHER you make it behave like 
XHTML1 validation currently behaves. In other words: You make sure that 
selecting the HTML5 preset for an XHTML document causes polyglot 
validation. OR, if you do not intend to go in that direction, then you 
do as you say - gray it out.  Because, clearly, the current behavior 
isn't useful.

> Anyway, the only reason why I can see why anybody would take the time
> needed to force the validator options into this state rather than just
> going with the sane defaults the validator provides is if you had the
> notion of a Polyglot document in your head and were trying to convince the
> validator to recognize a document as such even though it's not intended to
> and not advertised as providing any means for doing that.

As told above, I had your recommendation to use the XML mime type in my 
head.

> And now I've had to read through that bug and try to figure out the real
> problem is, it makes me realize the Polyglot spec is already introducing a
> validator-maintenance cost terms in terms of time I need to spend on bug
> reports due to expectations created by the Polyglot spec.

You know what: That's a cheap and self-centered point.

I suggest next time you you want to insist on promoting your favorite 
standpoint rather than discussing the subject of the thread (XML mime 
type rather than polyglot validation), then you do as you did with the 
EPUB thread today: you change the subject line of the message. That 
avoids confusion and saves everyone's time.
-- 
leif halvard silli
Received on Saturday, 26 January 2013 18:51:56 UTC