Re: HTML/XML TF Report glosses over Polyglot Markup (Was: Statement why the Polyglot doc should be informative) from Leif Halvard Silli on 2012-12-03 (public-html@w3.org from December 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 3 Dec 2012 17:43:19 +0100
To: Robin Berjon <robin@w3.org>
Cc: public-html WG <public-html@w3.org>, www-tag@w3.org
Message-ID: <20121203174319689438.4e4f250c@xn--mlform-iua.no>
Robin Berjon, Mon, 03 Dec 2012 15:59:42 +0100:
> On 03/12/2012 15:04 , Leif Halvard Silli wrote:

(Subject: http://www.w3.org/TR/html-xml-tf-report/#uc01 )

>> But what has that to do with Polyglot Markup? The task force, by saying
>> that polyglot cannot solve this problem, are sending the signal that
>> some think polyglot markup can - or was meant to - solve that problem.
>> But of course it can't. Who said it could?
> 
> There are quite a few people who whenever they hear "HTML" and "XML" 
> in the same sentence just blurt out "polyglot!" I believe that the TF 
> was mostly reacting to that.
> 
> This isn't to say that polyglot itself makes that claim, just that 
> it's rather common.

I don't think the report itself discerns strictly between the polyglot 
format that was XHTML 1.0 + Appendix C and the "real" Polyglot Markup 
format of HTMl5. E.g. despite that Polyglot Markup does not allow it, 
it is, I believe - regardless of which parser that is is usd, no 
problem for a XML tool chain to handle a <script 
type="application/FOO+xml"> element that contains markup. 

However, I don't perceive that the TF report, under the first point, 
tries to distinguish between these to things: Effectively, the report 
says that "not all text/html will ever become valid XHTML, and 
therefore we dismiss the *subset* of XHTML known as Polyglot Markup". 
Looks classic straw man to me.

>> If you are dealing with polyglot markup, then you don't need Henri's
>> parser. Except that you can still use Henri's parser to process
>> polyglot markup, if you so wish. And if you are *not* dealing with
>> polyglot markup, then you can also use Henri's parser. And my point, my
>> comment to the TAG, was that Henri's parser and/or the rest of the tool
>> chain can take well-formed, or non-well-formed HTML and spit out
>> *polyglot* HTML.
> 
> In some cases yes, but not in general. PIs, xml:space, xml:base, and 
> noscript would have to be stripped, elements or attributes with weird 
> names or IDs would have to be changed, etc.

True. And I think that for some use cases, then "stripping" is 
perfectly all right. Just as it is also all right to *add* "/" to void 
elements. After all, we are discussing authoring here. But I don't say 
that an authoring tool should necessarily just remove, without warning 
the author etc.

> So if your source format is, say, "feasibly polyglot" then yes, you 
> could polyglotify it. But that's still not a general-purpose option 
> (though it's likely a more common one).

Well, if the "attitude" of the tool chain was to polyglotify up to a 
certain point, then I guess it could be a general purpose option. But, 
of course, the author should be allowed to have a say how far the 
polyglotification should reach.

>> Because polyglot is an output format.
> 
> Sorry but I'm not sure what that means. By that measure every format 
> is an output format... Polyglot is meant essentially as a chameleon 
> format such that it can be input to two different processors with the 
> same result. A bit like those shell scripts that masquerade as DOS 
> .BAT to be portable.

I meant that polyglot is a format that you feed to the parser, but 
that, unless the parser itself knows the polyglot format, then there is 
no guarantee that you get polyglot format out.

>> How to parse polyglot, by
>> contrast, is defined by XML, by HTML5 etc - and not by the polyglot
>> markup spec. It would have been relevant if the XML/HTML TF discussed
>> whether it would be useful to spit out polyglot markup via that
>> toolchain.
> 
> But... that would not answer the question.

You mean: It would not answer the question under the first point? I did 
not say you should place it under the first point. Also, I am, after 
all, primarily arguing that you should remove polyglot markup from the 
first point because it doesn't answer the question.

>> Had they done so, then they would have demonstrated that
>> they understood the purpose of polyglot markup. But as I said in my
>> reply, they don't discuss that option and prefers instead to mention
>> that polyglot markup cannot replace Henri's parser.
> 
> I'm pretty sure that the TF understood the purpose of polyglot; I'm 
> unclear as to why you're reading it from that angle though.

See what I said about straw man above.
-- 
leif halvard silli
Received on Monday, 3 December 2012 16:43:53 UTC