Re: Updated RDFa-in-text/html tests from Philip Taylor on 2009-06-05 (public-rdf-in-xhtml-tf@w3.org from June 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Fri, 05 Jun 2009 16:57:11 +0100
To: Shane McCarron <shane@aptest.com>
CC: Mark Birbeck <mark.birbeck@webbackplane.com>, Jeni Tennison <jeni@jenitennison.com>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4A294057.8080204@cam.ac.uk>

Shane McCarron wrote:
> Mark Birbeck wrote:
>> Anyway, as the spec stands at the moment, I think it is legitimate for
>> processors to parse a prefix mapping that's empty, and then to use
>> that in a CURIE.
> 
> [...] I consider this a pathological edge 
> case.   While I agree that processors should handle this case in a 
> consistent manner, this is not the sort of thing I would expect to 
> encounter in the wild - at least not in any page where the author 
> expected to get triples out.

I agree it's mostly just a pathological edge case - if someone declares 
a prefix as the empty string, and then uses that prefix for RDFa data, 
they can't really expect it to work sensibly. But it's quite common in 
text/html documents to have xmlns:* attributes with no value, e.g.:

http://livecom.spaces.live.com/ says:

   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
   <html xmlns:spaces xmlns:Web web:culture="en-GB">

http://www.zhsm.net/www/Syjh/business.asp?info_id=71288 says:

   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
   <HTML XMLNS:CC>

http://www.scapino.nl/ says:

   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="nl" lang="nl" 
xmlns:IE>

http://sciencecareers.sciencemag.org/career_magazine/previous_issues/articles/2003_08_01/noDOI.9006923641284003811 
says:

   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   ...
   <b xmlns:x="">"Y</b>

and so on. It would be nice if these people could use RDFa (e.g. by 
copying-and-pasting CC licensing data), and still have their page 
handled robustly (i.e. still being able to extract the CC data) despite 
having this kind of bogus markup elsewhere in their pages. Fatal errors 
would be bad (they'd make it hard for someone to incrementally adopt 
RDFa because they'd have to fix all these other issues in their markup 
first), but anything else (ignoring the attribute, undeclaring the 
prefix, treating it as a relative URI, etc) seems reasonable to me.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Friday, 5 June 2009 15:57:50 UTC