Re: HTML 4 Profile for RDFa from Shelley Powers on 2009-05-23 (public-rdfa@w3.org from May 2009)

From: Shelley Powers <shelleyp@burningbird.net>
Date: Sat, 23 May 2009 16:02:19 -0500
To: Philip Taylor <pjt47@cam.ac.uk>
CC: Shane McCarron <shane@aptest.com>, Julian Reschke <julian.reschke@gmx.de>, Sam Ruby <rubys@intertwingly.net>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A18645B.6050109@burningbird.net>
Shelley Powers wrote:
> Philip Taylor wrote:
>> Shane McCarron wrote:
>>> Julian Reschke wrote:
>>>>
>>>> It's clear that if RDFa is to be used with prefix declarations done 
>>>> with xmlns, then mixing uppercase and lowercase declarations is not 
>>>> going to work.
>>>>
>>>> I think restricting prefixes to be lower-case (insert proper 
>>>> Unicode terminology here) would be acceptable; it's easy to live 
>>>> with, and avoids introducing yet another prefix declaration mechanism.
>>>
>>> I would not be opposed to adding text in the RDFa in HTML definition 
>>> like "prefix names SHOULD be defined in lower-case to help ensure 
>>> maximum portability among parsers, since it is common for DOM-based 
>>> parsers to not preserve the case of attribute names."
>>
>> If portability isn't guaranteed in a very simple case like this, then 
>> it sounds like the specification would have failed at the fundamental 
>> task of specifying behaviour that will be interoperably implemented.
>>
>> (Once portability is guaranteed, it might be good to recommend 
>> against using non-lowercase prefixes because they might have 
>> surprising (but portable) behaviour, but that's a very different 
>> reason.)
>>
>>> I don't see there being any need to change the definition of 
>>> XML-based languages like RDFa for XHTML.  After all, in XML case is 
>>> preserved.  Or is ot someone's goal that documents be able to be 
>>> parsed as EITHER XML or HTML?  It's not my goal.  If I define a 
>>> document using an HTML family language, I expect it to be parser 
>>> using an HTML family parser.  If I define it using an XHTML family 
>>> language then I expect it to be parsed using an XML-conforming 
>>> parser.  Such a parser would preserve the case of element and 
>>> attributes.
>>
>> People will read the RDFa-in-XHTML specs and guides and tutorials and 
>> examples, and use the same syntax in their own pages. Then they'll 
>> serve their pages as text/html and expect it to work the same.
>>
>> A survey of random pages from dmoz.org about a year ago found that 
>> ~18% used an XHTML doctype, and ~0.03% were served as 
>> application/xhtml+xml. On the Alexa top 200 a bit earlier 
>> (http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html), 
>> a third used an XHTML doctype and three quarters of those were not 
>> well-formed XML. So: Any new markup will be overwhelmingly served as 
>> text/html, and most of it that claims to be XHTML won't be usable 
>> with an XML parser.
>>
>> Thus, the XHTML syntax will mostly be processed using the 
>> RDFa-in-text/html processing rules. If those rules don't do what 
>> people expect (after they've read the XHTML-focused specs and guides 
>> and tutorials and examples), then they will be surprised and unhappy 
>> and it will be a bad situation.
>>
>> To make the situation better, either (a) the RDFa-in-XHTML 
>> documentation should all be removed and replaced with 
>> RDFa-in-text/html documentation so that people won't be encouraged to 
>> use the wrong syntax in their pages; or (b) the RDFa-in-XHTML syntax 
>> should give the same results (as far as possible, given the 
>> backward-compatibility constraints) when processed with the 
>> RDFa-in-text/html processing rules.
>>
>> I presume (a) isn't going to happen. That leaves (b), which would 
>> require coordination between RDFa-in-XHTML and RDFa-in-text/html, and 
>> seems likely to require changes to the RDFa-in-XHTML spec to smooth 
>> out the differences.
>>
> Wow, Philip, you're using an 8-gauge shotgun to hunt baby bunnies here.
>
> Can I take a leap of faith and guess that of the 18% of web pages 
> served up with the XHTML doctype not using well formed XML probably 
> are also not using RDFa?
>
> The RDFa in XHTML spec doesn't need to change if a new document 
> covering RDFa in HTML is created. Does it? Maybe a cross-reference 
> between the documents, with a general warning about differences 
> between the two documents would be good.
>
> As it is, there's probably going to be confusion about XHML versus 
> HTML with the HTML5 spec. I'm rather waiting for someone to use <br> 
> in XHTML5.
>
> Shelley
>
>
>
Well, OK, there probably are sites that are using the XHTML doctype and 
are served up as HTML. Most of the Drupal sites are this way. But the 
RDFa being embedded in Drupal 7 is using known, and lowercase, prefixes.

So we add another filter: how many are using uppercase RDFa prefixes? I 
just find it unlikely folks are using XMLNS. And I'm not sure the edge 
case is worth general upheaval.

Regardless, I don't think an exception based on RDFa should be allowed 
to override default handling for XHTML and HTML. I think that's a bad 
precedent. Warnings in documentation should be good. Warnings with 
applications that process RDFa should also be good. Maybe something 
along the lines of Opera's XHTML processing ("Would you like up to 
ignore that whole case thing?")

And this is an issue that the HTML5 spec should address, apart from RDFa.

Shelley
Received on Saturday, 23 May 2009 21:03:12 UTC