Re: HTML 4 Profile for RDFa from Shelley Powers on 2009-05-23 (public-rdfa@w3.org from May 2009)

From: Shelley Powers <shelleyp@burningbird.net>
Date: Sat, 23 May 2009 16:10:39 -0500
To: Philip Taylor <pjt47@cam.ac.uk>
CC: Shane McCarron <shane@aptest.com>, Julian Reschke <julian.reschke@gmx.de>, Sam Ruby <rubys@intertwingly.net>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A18664F.9010307@burningbird.net>

Philip Taylor wrote:
> Shelley Powers wrote:
>> Philip Taylor wrote:
>>> [...]
>>> A survey of random pages from dmoz.org about a year ago found that 
>>> ~18% used an XHTML doctype, and ~0.03% were served as 
>>> application/xhtml+xml. On the Alexa top 200 a bit earlier 
>>> (http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html), 
>>> a third used an XHTML doctype and three quarters of those were not 
>>> well-formed XML. So: Any new markup will be overwhelmingly served as 
>>> text/html, and most of it that claims to be XHTML won't be usable 
>>> with an XML parser.
>>>
>>> Thus, the XHTML syntax will mostly be processed using the 
>>> RDFa-in-text/html processing rules. If those rules don't do what 
>>> people expect (after they've read the XHTML-focused specs and guides 
>>> and tutorials and examples), then they will be surprised and unhappy 
>>> and it will be a bad situation.
>>> [...]
>>
>> Can I take a leap of faith and guess that of the 18% of web pages 
>> served up with the XHTML doctype not using well formed XML probably 
>> are also not using RDFa?
>
> They aren't, because approximately no pages (regardless of doctype or 
> well-formedness) are using RDFa. Looking at some more recent data 
> (~425000 pages from http://www.dotnetdotcom.org/ collected in the past 
> few months), about 0.04% of pages in the sample appear to contain RDFa 
> attributes (specifically 'property' containing a colon).
>
> But I presume the idea is for RDFa to become much more widely used, 
> and I have no reason to doubt that it would end up with roughly the 
> same spread of text/html vs application/xhtml+xml and well-formed vs 
> ill-formed, so the numbers are still relevant.
>
But we're addressing two things here: what do we do with what we have 
now, and how will we move into the future?

If none of these pages were using RDFa (or so small as to be irrelevant) 
then we're not "breaking" the web by insisting on following HTML 
processing rules when it comes to RDFa in HTML, while still preserving 
existing XHTML rules for RDFa in XHTML. And we wouldn't be breaking the 
web, anyway, because RDFa was released for XHTML -- use in HTML pages at 
somewhat your own risk.

That's not being mean to the web designer/developer/Uncle Joe and his 
page on bowling balls. It's not holding the web back because of edge 
cases of undocumented, or unsupported uses.

And thankfully, Google used all lowercase prefixes.

Shelley

Received on Saturday, 23 May 2009 21:11:29 UTC