Re: HTML/XML TF Report glosses over Polyglot Markup from Sam Ruby on 2012-12-03 (public-html@w3.org from December 2012)

From: Sam Ruby <rubys@intertwingly.net>
Date: Mon, 03 Dec 2012 11:35:53 -0500
To: Robin Berjon <robin@w3.org>
CC: public-html@w3.org
Message-ID: <50BCD4E9.10709@intertwingly.net>

On 12/03/2012 10:28 AM, Robin Berjon wrote:
> On 03/12/2012 14:26 , Sam Ruby wrote:
>> On 12/03/2012 07:48 AM, Robin Berjon wrote:
>>> On 03/12/2012 12:02 , Henry S. Thompson wrote:
>>>> Robin Berjon writes:
>>>>> Saying "polyglot" here just doesn't help: very little real-world
>>>>> content uses it. Note that the section clearly looks at polyglot and
>>>>> gives a clear reason for not using it in this case.
>>>>
>>>> That depends on where you look.  I know of a number of companies whose
>>>> products produced, by design, HTML-compatible XHTML, which we would
>>>> now call polyglot, precisely because it gave them the ability to
>>>> post-process with XML tools while at the same time serving to IE6
>>>> clients confidently.  The parallel requirements aren't going away, and
>>>> polyglot HTML5 will serve them very well.
>>>
>>> I know there is polyglot in the wild, I've used it in the past. But
>>> there's a big difference between "some people use it" and "it's used
>>> enough that one can build a useful strategy relying on it for arbitrary
>>> content".
>>
>> Who sets the bar for "enough"?
>
> You seem to be responding without appropriate context. If you read the
> beginning of the thread[0], this was about "2.1 How can an XML toolchain
> be used to consume HTML?" from the HTML/XML Task Force Report[1] that a
> few of us here were on a couple years ago.
>
> So, faced with the task of processing HTML at large with an XML tool
> chain, I'm very much confident that "polyglot" is not the answer without
> needing anyone to set the bar for me. There certainly are tasks for
> which a 6+% success rate is "enough" (neutrino detection, say) but
> document processing generally isn't one of them.
>
> This isn't to say that polyglot isn't useful in the right context — as I
> say above in the bit you quote I've used it myself. But it's not a
> solution to generally processing HTML with XML tools, which is good
> because it's also not something it set out to be. Hence my disagreement
> with Leif and Henry.

I'll note that the immediate context was producing markup that could be 
simultaneously be processed by multiple disparate clients with 
confidence.  Two such clients were mentioned.

I believe that the root problem is that we are looking at this problem 
from the different perspectives: document production vs consumption. 
The best chance for interop comes from producing documents 
conservatively and consuming them liberally.

Saying "6+% success rate in parsing conservatively" is a valid argument 
for liberal parsers.  It is not an argument against conservative production.

>> If three people want to get together and collaborate, should the fact
>> that some (and indeed many) may not want to participate be ground for
>> stopping them?
>
> No, but then again I never said otherwise.

Cool.

> [0] http://lists.w3.org/Archives/Public/public-html/2012Dec/0008.html
> [1] http://www.w3.org/2010/html-xml/snapshot/report.html

- Sam Ruby

Received on Monday, 3 December 2012 16:36:28 UTC