W3C home > Mailing lists > Public > public-html@w3.org > December 2012

Re: HTML/XML TF Report glosses over Polyglot Markup (Was: Statement why the Polyglot doc should be informative)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 3 Dec 2012 17:16:00 +0100
To: Robin Berjon <robin@w3.org>
Cc: Henri Sivonen <hsivonen@iki.fi>, public-html WG <public-html@w3.org>, www-tag@w3.org
Message-ID: <20121203171600336701.a75fb593@xn--mlform-iua.no>
Robin Berjon, Mon, 03 Dec 2012 11:35:38 +0100:

>> Regarding "2.1 How can an XML toolchain be used to consume HTML?"
>>            http://www.w3.org/TR/html-xml-tf-report/#uc01,


> Saying "polyglot" here just doesn't help: very little real-world 
> content uses it. Note that the section clearly looks at polyglot and 
> gives a clear reason for not using it in this case.

I agree. So why did the report say "polyglot" here? Why did you, 
negatively, bring it in? I see no relevance. To quote what you say at 
the bottom of this letter: This was not one of the "the uses it was 
designed for".

>> Regarding "2.2 How can an HTML toolchain be used to consume XML?"
>>            http://www.w3.org/TR/html-xml-tf-report/#uc02

>>   TF says: "the most successful approach may be to simply translate
>>            the XML to HTML5 before passing it to the HTML5 tool"
>>   Verdict: How come this section didn't evaluate Polyglot Markup?
> 
> "Processing a real XML document with an HTML5 parser is probably 
> never going to be possible with complete fidelity." In general it's 
> not a problem you can solve. And polyglot (rightfully IMHO) doesn't 
> even try.

What is "a real XML document"? The XML/HTML TF report, under the next 
point, point 2.3, points out that XHTML5 *is* real XML.

If by "real XML" the TF meant something that is not HTML5, then it is 
not HTML5 but also not XHTML5. Thus it is not Polyglot Markup, in the 
strict sense. And also not HTML5 or XHTML5 in the strict sense. But it 
is of course possible to apply the principles of Polyglot Markup to 
extended XHTML5/HTML5.

>> Regarding "2.3 How can islands of HTML be embedded in XML?"
>>            http://www.w3.org/TR/html-xml-tf-report/#uc03

>>   TF says: EITHER, create HTML as "well-formed XML" = "requirements
>>            on the author" OR absolve the author by (having the tool)
>>            escaping markup.
>>   Verdict: How come you didn't mention having the tool output
>>            Polyglot Markup?
> 
> It pretty much says either use XHTML (in which case you don't need 
> polyglot)

That is your words. The report doesn't discuss it. And if the purpose 
of the document - this XML/HTML TF report - was *not* to discuss the 
use of polyglot markup, then what has that report to do in the 
discussion of Polyglot Markup?

> or embed the HTML as text (in which case you don't need 
> polyglot). Recommending polyglot here would depend too much on the 
> specifics of the usage, and in general wouldn't help.

First you say "use XHTML … then you don't need polyglot". Then in the 
next sentence you say "as text … in which case you don't need XHTML".

To which I say: Fallacy. http://en.wikipedia.org/wiki/Fallacy


Or how does it follow from this that "you don't need polyglot"? The 
only thing that follows from this is that the author has TWO options if 
the he has a polyglot in his hands (namely a choice between "as text" 
and "use XHTML"), and ONE option if he has non-XHTML-compatible HTML in 
is hands (namely as text).

Is it a goal for you that the author must be capable of making the 
right choice? To me it is a goal that the author can't err regardless 
of what choice he makes. Which is why polyglot would have been relevant 
to mention for this use case.

>> Regarding "2.4 How can islands of XML be embedded in HTML?"
>>            http://www.w3.org/TR/html-xml-tf-report/#uc04

>>   TF says: Use <script> as XML container and use JavaScript to make it
>>            render in the DOM.
>>   Verdict: It seems like Polyglot Markup does not discuss that approach.
>>            If the TF document had purported to be an evaluation of
>>            Polyglot Markup, you would have discussed it.
>>      Also: I don't understand the last sentence: "Note also that
>>            polyglot markup is not an aid here as it forbids arbitrary
>>            XML content from the document." Does it? It doesn't any
>>            more than HTML5 proper does: If you add something that
>>            HTML5 doesn't permit, then it isn't HTML5 any more but
>>            "extended  HTMl5". But clearly, it is possible to create
>>            "extended polyglot markup" - just apply its principles.
> 
> That section's advice is mostly missing a mention of the pitfalls of 
> </script> IMHO. Including XML in <script> is definitely *not* 
> something that polyglot should recommend since you'd get very 
> different DOMs on either side. It's a useful technique when you know 
> you'll be parsed as HTML — and therefore clearly outside polyglot.

OK. I agree on this one.

>> Regarding "2.5 How can XML be made more forgiving of errors?"
>>            http://www.w3.org/TR/html-xml-tf-report/#uc05

>>   TF says: XML5, error handling in XML etc.
>>   Verdict: Provided that the goal of the task force (improved
>>            "interoperability between HTML and XML") could be
>>            be helped by making XML fail in the exact way that
>>            HTML fails, then why did you not discuss Polyglot
>>            Markup as an option here?
> 
> Because looking a potential future changes to XML is completely 
> outside the scope of polyglot. It's also completely different from 
> polyglot's goals.

Sure. But, looking at the goals of that task force - "interoperability 
between HTML and XML", then if if someone produces an imperfect 
polyglot, then it would fail like HTML (if served as text/html, that 
is). 

Also, the TF doesn't tell us why - or how - introducing HTML-like error 
handling in XML improves the "interoperability between HTML and XML". 
For instance - just to bring in a question that you seemingly find it 
is OK to ask about Polyglot: Could we just skip Henri's parser if we 
introduces XML error handling in XML? Care to tell?

My motivation for bringing in polyglot markup into this subject is very 
much related to the final paragraph of the preceding point 2.4, "How 
can islands of XML be embedded in HTML?". Because, in that paragraph, 
the TF deviates from the subject, by pointing out that instead of 
embedding XML in HTML (text/html), one might instead embed XML in XHTML 
(application/xhtml+xml). (Voila!) On that background, I find it quite 
relevant to point out that, hey, with regard to the problem of letting 
XML err as HTML, why not instead serve the XML as polyglot text/html?

>>   Verdict: The idea that this HTML parser could produce polyglot markup
>> (and no: not in order to pee in the tag soup ocean, but in order to be
>> a more useful parser in that tool chaing!), is never discussed.
> 
> I'm not even sure what it would mean for an HTML parser to produce 
> polyglot markup.

That question might have been answered in my other message 
(http://lists.w3.org/Archives/Public/public-html/2012Dec/0013). But to 
clarify: I am/was not certain about the role of Henri's parser. I 
thought that his parser would preprocess the tag soup HTML so that the 
XML tools then can work with XML rather than with HTML. At any rate, 
his parser - or the tool-chain as a whole - could produce a polyglot - 
for the benefit of whoever/whichever is going to subsequently work on 
that document.

But true: From the point of view that Polyglot Markup has very strict 
rules, it would be demanding to e.g. convert a document containing an 
embedded script to something that conforms 100% to the rules of 
Polyglot Markup, since Polyglot Markup doesn't permit to embed a script 
directly in the page unless the script follows strict rules: 
http://www.w3.org/TR/html-polyglot/#script-and-style


>> Over all, the report is trapped in some well known dichotomies. And
>> Polyglot Markup is not considered in a serious way. The Task Force's
>> report is a very thin basis for rescinding the request for robust,
>> polyglot markup.
> 
> Actually, we considered polyglot seriously. We found polyglot to be 
> useful for the uses it was designed for, but not applicable to all 
> cases in which XML/HTML interoperability is desirable.

I think "the uses it was designed for" is a crucial statement. I 
continue to no understand why the Reports wastes ink on telling that 
Polyglot is not useful for the things it is not designed for - and does 
so even in the concluding statement. Also, to the extend that the goal 
of the Report was to say something about Polyglot, then I think that 
the report could have looked different if the TF had members from, 
shall we say, the polyglot community.
-- 
leif halvard silli
Received on Monday, 3 December 2012 16:16:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 3 December 2012 16:16:37 GMT