Re: ISSUE-4 - versioning/DOCTYPEs from Sam Ruby on 2010-05-17 (public-html@w3.org from May 2010)

From: Sam Ruby <rubys@intertwingly.net>
Date: Mon, 17 May 2010 11:54:23 -0400
To: Henri Sivonen <hsivonen@iki.fi>
CC: public-html@w3.org, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Boris Zbarsky <bzbarsky@MIT.EDU>, Daniel Glazman <daniel.glazman@disruptive-innovations.com>
Message-ID: <4BF166AF.4050805@intertwingly.net>
On 05/17/2010 10:55 AM, Henri Sivonen wrote:
> "Sam Ruby"<rubys@intertwingly.net>  wrote:
>
>>> 1) Serving XHTML+SVG or XHTML+MathML or XHTML+SVG+MathML content
>>> as application/xhtml+xml to Gecko, WebKit, Presto and
>>> Trident+MathPlayer but serving the same bytes as text/html to
>>> Trident (sans MathPlayer) in order to be able to use SVG and/or
>>> MathML inline where supported but allowing the users of
>>> unextended IE still read the (X)HTML content of the document.
>>>
>>> 2) Serving application/xhtml+xml that doesn't use any non-HTML
>>> features as Gecko, WebKit and Presto as a matter of pro-XML
>>> principle but serving the same bytes to Trident as text/html
>>> because the author's pro-XML principle doesn't go far enough to
>>> exclude IE users from his/her audience.
>>>
>>> 3) Serving content as text/html but using an XML parser to
>>> process the content in a non-browser scenario where the party
>>> operating the XML parser has the power to make the publisher
>>> supply the content in a form that is safe for XML parsers.
>>>
>>> Leif, are there additional use cases that I'm missing?
>>
>> As someone who serves content as application/xhtml+xml to browsers
>> that support it, and the same content as text/html to browsers that
>> don't, none of the descriptions above resonate with me.  Perhaps it
>> is because of manner in which you chose to express these cases.
>
> In my thinking, your blog and planet were instances of case #1. What
> part of the description of #1 doesn't resonate with you?

Sometime before the end of 2004, I started serving my weblog as
application/xhtml+xml.  At first, I wasn't very careful about it:

http://intertwingly.net/blog/2004/11/15/Vigilance

Sometime in 2006, I started experimenting with inline SVG:

http://intertwingly.net/blog/2006/06/17/Inline-SVG

As an aside, I was one of the first to convert my site3 over to HTML5. 
At the time I was using a version of Firefox 2, and my usage of XHTML 
made things possible:

http://intertwingly.net/blog/2007/12/04/HTML5-Deployment-Considerations#c1196807365

>> As for me, I simply want to be conservative in what I send.  This
>> is the first half of the robustness principle.  This enables people
>> who have off-the shelf xml parsers to process my pages.  Not
>> because they hold any special power over me, but simply because I
>> enabled it.
>
> Interesting. I hadn't thought of your site of being an instance of
> case #3 (without the power part).
>
> Do you know if people actually process your pages (as opposed to your
> feeds) using off-the-shelf XML parsers without any prior arrangement
> with you?

While I know that I do, I don't know specifically of anybody else who 
does.  But if you don't mind, permit me to generalize your statement as 
I believe that it will provide insight.

I do know that plenty of people process my markup using non-HTML5 
compliant parsers, including but not limited to modern browsers.

And as I include the same markup inside my feeds, that statement 
includes feed parsers.  Venus uses the Universal Feed Parser.  While it 
will do its first pass using an XML parser, it will process the HTML 
content multiple times, using things like regular expressions.

Despite what you suggested in #2, we are not yet in the nirvana HTML5 
compliant parsers are bug free and ubiquitous.  No matter how much we 
preach otherwise, people will continue to utilize various tag soup 
processors and even regular expressions.  I'll even admit that I use 
regular expressions from time to time.  And will probably continue to do 
so for the forseeable future.

As an example, the pages linked from my Depot Dashboard are produced by 
programs and consumed by programs, many of which (to this day) use 
regular expressions:

http://intertwingly.net/projects/dashboard.html

If you drill down a few of those links, you will find that the pages are 
also extremely consistent in details such as the indentation I use -- 
something that no self-respecting XML or HTML5 processor would care 
about.  By being conservative in what I send (and by that, I don't 
simply mean well-formed XML, but avoiding constructs which are likely to 
be confused when parsed using a different parser) I can enable a wide 
number of potential future uses, even ones that I have not thought of yet.

To sum things up: my markup is not just XML compliant, but also HTML5 
compliant, and fares well with a large variety of tag soup parsers out 
there.  I have found that conforming to a polyglot syntax is a good 
first order approximation of what it takes to be universally consumable.

- Sam Ruby
Received on Monday, 17 May 2010 15:54:58 UTC