Re: HTML 4 Profile for RDFa from Philip Taylor on 2009-05-24 (public-html@w3.org from May 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Sun, 24 May 2009 18:04:30 +0100
To: Shane McCarron <shane@aptest.com>
CC: Julian Reschke <julian.reschke@gmx.de>, Sam Ruby <rubys@intertwingly.net>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A197E1E.6050309@cam.ac.uk>

Shane McCarron wrote:
> Philip Taylor wrote:
>> Shane McCarron wrote:
>>> I would not be opposed to adding text in the RDFa in HTML definition
>>> like "prefix names SHOULD be defined in lower-case to help ensure
>>> maximum portability among parsers, since it is common for DOM-based
>>> parsers to not preserve the case of attribute names."
>>
>> If portability isn't guaranteed in a very simple case like this, then 
>> it sounds like the specification would have failed at the fundamental 
>> task of specifying behaviour that will be interoperably implemented.
> 
> That's really not the same issue at all.... but let's go there.  
> Portability and interoperability in this context are specifically 
> related to the triples that are extracted from identical input by 
> different conforming processors.  The specification REQUIRES case 
> sensitive processing of prefix names.  Right now.  There is no question 
> about that.  And a conforming processor will adhere to this 
> requirement.  I would not be open to loosening that requirement, since 
> it seems silly to do so.
> 
> However, I could envision a client-side processor running on a legacy 
> user agent that would have trouble adhering to this requirement.  Such a 
> processor would NOT be a conforming processor and portability and 
> interoperability would NOT be guaranteed.  However, with some simple 
> guidance to authors we can help to increase the portability among even 
> these non-conforming processors.  That's goodness, and costs us nothing.

Hmm, I think I'm not clear on the context of your statements, so I may 
be misunderstanding... As I see things now:

In the context of RDFa-in-XHTML, any XML parser will preserve the case 
of attributes, and as far as I'm aware (though I haven't tested it 
extensively) all current RDFa-in-XHTML implementations do case-sensitive 
comparisons of prefixes, and the spec requires that, so it's all 
self-consistent and fine.

In the context of RDFa-in-text/html, all current implementations treat 
attribute names as lowercase and then do case-sensitive prefix 
comparisons. So e.g. <div xmlns:vCard="..." property="vCard:..."> will 
fail to extract any triples (because the only defined prefix is "vcard", 
not "vCard").

The lowercasing of attribute names is not an issue restricted to legacy 
UAs - it's a part of the way HTML works, and (very likely) the way HTML 
will always work, and any current or future RDFa-in-text/html processor 
that uses an HTML parser will work this way.

In particular, I tested 
http://philip.html5.org/demos/rdfa/case-sensitivity-nonwf.html with 
recent versions of http://www.w3.org/2006/07/SWD/RDFa/impl/js/ and 
rdfQuery in Firefox 3.0 (which die with exceptions but otherwise do 
things in lowercase); and pyRdfa, and Swignition, and 
http://developer.search.yahoo.com/help/objectfinder?url=..., and all 
appear to work in the same way. (Are there any others that support 
text/html input that I'm missing?)

Given that all these implementations work the same, and it would be very 
difficult to change them to preserve attribute name case (because they 
could no longer use a standard text/html parser), it seems to me that 
the specification must specify this behaviour, so that all the 
RDFa-in-text/html processors can extract the same triples from the same 
documents and so that they can all conform to the spec.

Am I missing something here?

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Sunday, 24 May 2009 17:05:10 UTC