- From: Sam Ruby <rubys@intertwingly.net>
- Date: Wed, 13 Mar 2013 10:54:05 -0400
- To: Alex Russell <slightlyoff@google.com>
- CC: Robin Berjon <robin@w3.org>, Maciej Stachowiak <mjs@apple.com>, Henri Sivonen <hsivonen@iki.fi>, "www-tag@w3.org List" <www-tag@w3.org>, Jeni Tennison <jeni@jenitennison.com>
On 03/13/2013 10:20 AM, Alex Russell wrote: > > On Mar 13, 2013 3:57 AM, "Robin Berjon" <robin@w3.org > <mailto:robin@w3.org>> wrote: > > > > On 13/03/2013 00:29 , Alex Russell wrote: > >> > >> * */Nobody knows how popular it is./* The lack of signage, coupled > >> > >> with default in-browser parsing as HTML means that few on any side > >> of the debate understand to what extent producers are creating this > >> sort of content. It's difficult to draw any conclusions about > >> importance based on a lack of information either way as a result. > > > > > > Just FYI, I ran a quick and dirty XML parse on the Paciello dataset > (a few thousand home pages taken from the top most visited sites — this > is therefore very heavily skewed towards the actively and professionally > maintained Web, but often useful nevertheless). > > > > The proportion of polyglot documents was 569/8881 (non-polyglot: > 8311/8881), or roughly 6.4% vs 93.6%. > > Are they all served as text/html? Almost certainly. > Also, do you have any interest in cron-ing this? The trend data seems > invaluable to the discussion. I thought the point of the proposal was to bring the tag discussion on this item to a close. A proposal that by the way I completely support. As to the trend data, as with most data, it is subject to interpretation. I'll assert that it is unlikely that random HTML would be considered well-formed HTML. As such, most if not all of this 6.4% is likely intentional. As to the rest, we can't infer the intent. Perhaps there are some who intended to be well-formed XML, but failed for some reason. Given how hard it is to consistently produce well formed XML (something I can personally attest to), my intuition is that this number is much greater than the 6.4% that succeeded (for the moment). But the truly unknowable is whether these authors would consider this a bug or would change their mind as to whether the wished to continue to produce HTML that also happens to be well-formed XML. I also happen to believe that there is a point of diminishing return involved. Avoiding inline scripts or adding CDATA talismans at the beginning and ends of those scripts isn't necessary for most. Explicitly closing all tags, however, does tend to expose more common markup errors AND makes your markup correctly consumable by a lot of not-quite but almost HTML parsers out there. In any case, I think this discussion should continue... but on public-html. Everyone here is welcome to participate. > >> As a result of all of the above, having (I hope) fairly weighed the > >> arguments, I would like to recommend that we find a way to extricate > >> ourself from the request. It doesn't matter to the future of Polyglot, > >> and it does not, in my view, serve the TAG to be in the middle of this. > >> Polyglot can have whatever future it will in the W3C without our group > >> involvement. > > > > +1 +1 > > Just because the polyglot discussion awakens some of the old XML/HTML > politics doesn't mean it's architectural. At any rate there certainly > are more pressing topics for the TAG to apply its energies to. Politics is certainly a loaded word, but beyond that, +1 there too. > > -- > > Robin Berjon - http://berjon.com/ - @robinberjon - Sam Ruby
Received on Wednesday, 13 March 2013 14:54:39 UTC