Re: HTML/XML TF Report glosses over Polyglot Markup (Was: Statement why the Polyglot doc should be informative) from Robin Berjon on 2012-12-03 (www-tag@w3.org from December 2012)

From: Robin Berjon <robin@w3.org>
Date: Mon, 03 Dec 2012 15:59:42 +0100
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: public-html WG <public-html@w3.org>, www-tag@w3.org
Message-ID: <50BCBE5E.9020404@w3.org>

On 03/12/2012 15:04 , Leif Halvard Silli wrote:
> My comment: Perhaps my point wasn't clear enough and could be
> misunderstood. That, in turn, might be coloured by me not understanding
> perfectly how these tool chains works. But I stand by - firmly - that
> the XML/HTML task force mixed things up in their first conclusion. They
> serve a false dichotomy. Why? Let me first say that I agree that their
> problem statement was all right. Nothing wrong with it. There is a need
> to process HTML. And Henri's parser is one way to solve that problem.
> Fine.
>
> But what has that to do with Polyglot Markup? The task force, by saying
> that polyglot cannot solve this problem, are sending the signal that
> some think polyglot markup can - or was meant to - solve that problem.
> But of course it can't. Who said it could?

There are quite a few people who whenever they hear "HTML" and "XML" in 
the same sentence just blurt out "polyglot!" I believe that the TF was 
mostly reacting to that.

This isn't to say that polyglot itself makes that claim, just that it's 
rather common.

> If you are dealing with polyglot markup, then you don't need Henri's
> parser. Except that you can still use Henri's parser to process
> polyglot markup, if you so wish. And if you are *not* dealing with
> polyglot markup, then you can also use Henri's parser. And my point, my
> comment to the TAG, was that Henri's parser and/or the rest of the tool
> chain can take well-formed, or non-well-formed HTML and spit out
> *polyglot* HTML.

In some cases yes, but not in general. PIs, xml:space, xml:base, and 
noscript would have to be stripped, elements or attributes with weird 
names or IDs would have to be changed, etc.

So if your source format is, say, "feasibly polyglot" then yes, you 
could polyglotify it. But that's still not a general-purpose option 
(though it's likely a more common one).

> Because polyglot is an output format.

Sorry but I'm not sure what that means. By that measure every format is 
an output format... Polyglot is meant essentially as a chameleon format 
such that it can be input to two different processors with the same 
result. A bit like those shell scripts that masquerade as DOS .BAT to be 
portable.

> How to parse polyglot, by
> contrast, is defined by XML, by HTML5 etc - and not by the polyglot
> markup spec. It would have been relevant if the XML/HTML TF discussed
> whether it would be useful to spit out polyglot markup via that
> toolchain.

But... that would not answer the question.

> Had they done so, then they would have demonstrated that
> they understood the purpose of polyglot markup. But as I said in my
> reply, they don't discuss that option and prefers instead to mention
> that polyglot markup cannot replace Henri's parser.

I'm pretty sure that the TF understood the purpose of polyglot; I'm 
unclear as to why you're reading it from that angle though.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Monday, 3 December 2012 14:59:59 UTC