Re: HTML 4 Profile for RDFa from Philip Taylor on 2009-05-23 (public-rdf-in-xhtml-tf@w3.org from May 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Sat, 23 May 2009 21:21:44 +0100
To: Shane McCarron <shane@aptest.com>
CC: Julian Reschke <julian.reschke@gmx.de>, Sam Ruby <rubys@intertwingly.net>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A185AD8.9090402@cam.ac.uk>

Shane McCarron wrote:
> Julian Reschke wrote:
>>
>> It's clear that if RDFa is to be used with prefix declarations done 
>> with xmlns, then mixing uppercase and lowercase declarations is not 
>> going to work.
>>
>> I think restricting prefixes to be lower-case (insert proper Unicode 
>> terminology here) would be acceptable; it's easy to live with, and 
>> avoids introducing yet another prefix declaration mechanism.
> 
> I would not be opposed to adding text in the RDFa in HTML definition 
> like "prefix names SHOULD be defined in lower-case to help ensure 
> maximum portability among parsers, since it is common for DOM-based 
> parsers to not preserve the case of attribute names."

If portability isn't guaranteed in a very simple case like this, then it 
sounds like the specification would have failed at the fundamental task 
of specifying behaviour that will be interoperably implemented.

(Once portability is guaranteed, it might be good to recommend against 
using non-lowercase prefixes because they might have surprising (but 
portable) behaviour, but that's a very different reason.)

> I don't see there being any need to change the definition of XML-based 
> languages like RDFa for XHTML.  After all, in XML case is preserved.  Or 
> is ot someone's goal that documents be able to be parsed as EITHER XML 
> or HTML?  It's not my goal.  If I define a document using an HTML family 
> language, I expect it to be parser using an HTML family parser.  If I 
> define it using an XHTML family language then I expect it to be parsed 
> using an XML-conforming parser.  Such a parser would preserve the case 
> of element and attributes.

People will read the RDFa-in-XHTML specs and guides and tutorials and 
examples, and use the same syntax in their own pages. Then they'll serve 
their pages as text/html and expect it to work the same.

A survey of random pages from dmoz.org about a year ago found that ~18% 
used an XHTML doctype, and ~0.03% were served as application/xhtml+xml. 
On the Alexa top 200 a bit earlier 
(http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html), a 
third used an XHTML doctype and three quarters of those were not 
well-formed XML. So: Any new markup will be overwhelmingly served as 
text/html, and most of it that claims to be XHTML won't be usable with 
an XML parser.

Thus, the XHTML syntax will mostly be processed using the 
RDFa-in-text/html processing rules. If those rules don't do what people 
expect (after they've read the XHTML-focused specs and guides and 
tutorials and examples), then they will be surprised and unhappy and it 
will be a bad situation.

To make the situation better, either (a) the RDFa-in-XHTML documentation 
should all be removed and replaced with RDFa-in-text/html documentation 
so that people won't be encouraged to use the wrong syntax in their 
pages; or (b) the RDFa-in-XHTML syntax should give the same results (as 
far as possible, given the backward-compatibility constraints) when 
processed with the RDFa-in-text/html processing rules.

I presume (a) isn't going to happen. That leaves (b), which would 
require coordination between RDFa-in-XHTML and RDFa-in-text/html, and 
seems likely to require changes to the RDFa-in-XHTML spec to smooth out 
the differences.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Saturday, 23 May 2009 20:22:23 UTC