W3C home > Mailing lists > Public > www-tag@w3.org > March 2013

Re: Polyglot: the final thread?

From: Sam Ruby <rubys@intertwingly.net>
Date: Wed, 13 Mar 2013 10:54:05 -0400
Message-ID: <5140930D.7040509@intertwingly.net>
To: Alex Russell <slightlyoff@google.com>
CC: Robin Berjon <robin@w3.org>, Maciej Stachowiak <mjs@apple.com>, Henri Sivonen <hsivonen@iki.fi>, "www-tag@w3.org List" <www-tag@w3.org>, Jeni Tennison <jeni@jenitennison.com>
On 03/13/2013 10:20 AM, Alex Russell wrote:
>
> On Mar 13, 2013 3:57 AM, "Robin Berjon" <robin@w3.org
> <mailto:robin@w3.org>> wrote:
>  >
>  > On 13/03/2013 00:29 , Alex Russell wrote:
>  >>
>  >>   * */Nobody knows how popular it is./* The lack of signage, coupled
>  >>
>  >>     with default in-browser parsing as HTML means that few on any side
>  >>     of the debate understand to what extent producers are creating this
>  >>     sort of content. It's difficult to draw any conclusions about
>  >>     importance based on a lack of information either way as a result.
>  >
>  >
>  > Just FYI, I ran a quick and dirty XML parse on the Paciello dataset
> (a few thousand home pages taken from the top most visited sites  this
> is therefore very heavily skewed towards the actively and professionally
> maintained Web, but often useful nevertheless).
>  >
>  > The proportion of polyglot documents was 569/8881 (non-polyglot:
> 8311/8881), or roughly 6.4% vs 93.6%.
>
> Are they all served as text/html?

Almost certainly.

> Also, do you have any interest in cron-ing this? The trend data seems
> invaluable to the discussion.

I thought the point of the proposal was to bring the tag discussion on 
this item to a close.  A proposal that by the way I completely support.

As to the trend data, as with most data, it is subject to interpretation.

I'll assert that it is unlikely that random HTML would be considered 
well-formed HTML.  As such, most if not all of this 6.4% is likely 
intentional.

As to the rest, we can't infer the intent.  Perhaps there are some who 
intended to be well-formed XML, but failed for some reason.  Given how 
hard it is to consistently produce well formed XML (something I can 
personally attest to), my intuition is that this number is much greater 
than the 6.4% that succeeded (for the moment).  But the truly unknowable 
is whether these authors would consider this a bug or would change their 
mind as to whether the wished to continue to produce HTML that also 
happens to be well-formed XML.

I also happen to believe that there is a point of diminishing return 
involved.  Avoiding inline scripts or adding CDATA talismans at the 
beginning and ends of those scripts isn't necessary for most. 
Explicitly closing all tags, however, does tend to expose more common 
markup errors AND makes your markup correctly consumable by a lot of 
not-quite but almost HTML parsers out there.

In any case, I think this discussion should continue... but on 
public-html.  Everyone here is welcome to participate.

>  >> As a result of all of the above, having (I hope) fairly weighed the
>  >> arguments, I would like to recommend that we find a way to extricate
>  >> ourself from the request. It doesn't matter to the future of Polyglot,
>  >> and it does not, in my view, serve the TAG to be in the middle of this.
>  >> Polyglot can have whatever future it will in the W3C without our group
>  >> involvement.
>  >
>  > +1

+1

>  > Just because the polyglot discussion awakens some of the old XML/HTML
> politics doesn't mean it's architectural. At any rate there certainly
> are more pressing topics for the TAG to apply its energies to.

Politics is certainly a loaded word, but beyond that, +1 there too.

>  > --
>  > Robin Berjon - http://berjon.com/ - @robinberjon

- Sam Ruby
Received on Wednesday, 13 March 2013 14:54:39 GMT

This archive was generated by hypermail 2.3.1 : Wednesday, 13 March 2013 14:54:39 GMT