Re: HTML 4 Profile for RDFa from Shane McCarron on 2009-05-24 (public-html@w3.org from May 2009)

From: Shane McCarron <shane@aptest.com>
Date: Sun, 24 May 2009 12:48:51 -0500
To: Philip Taylor <pjt47@cam.ac.uk>
CC: Julian Reschke <julian.reschke@gmx.de>, Sam Ruby <rubys@intertwingly.net>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A198883.80401@aptest.com>

Philip Taylor wrote:
>
> Hmm, I think I'm not clear on the context of your statements, so I may 
> be misunderstanding... As I see things now:
>
> In the context of RDFa-in-XHTML, any XML parser will preserve the case 
> of attributes, and as far as I'm aware (though I haven't tested it 
> extensively) all current RDFa-in-XHTML implementations do 
> case-sensitive comparisons of prefixes, and the spec requires that, so 
> it's all self-consistent and fine.
I don't think I can speak for all current implementations, but yes - 
that's how it is supposed to work.
>
> In the context of RDFa-in-text/html, all current implementations treat 
> attribute names as lowercase and then do case-sensitive prefix 
> comparisons. So e.g. <div xmlns:vCard="..." property="vCard:..."> will 
> fail to extract any triples (because the only defined prefix is 
> "vcard", not "vCard").
Again, I can't speak for all current implementations.  However, 
certainly *MY* implementation is broken in this way.
>
> The lowercasing of attribute names is not an issue restricted to 
> legacy UAs - it's a part of the way HTML works, and (very likely) the 
> way HTML will always work, and any current or future RDFa-in-text/html 
> processor that uses an HTML parser will work this way.
Well.... its not the way HTML works.  HTML just says they are case 
insensitive.  Or rather, the SGML declaration for the HTML 4 DTD says 
this.  In the HTML DOM, my reading is that element and attribute names 
are returned in uppercase [1].  So no, I don't think they are supposed 
to be lowercased.  I think they are supposed to be uppercased, and I 
think that some implementations do it wrong.
>
> In particular, I tested 
> http://philip.html5.org/demos/rdfa/case-sensitivity-nonwf.html with 
> recent versions of http://www.w3.org/2006/07/SWD/RDFa/impl/js/ and 
> rdfQuery in Firefox 3.0 (which die with exceptions but otherwise do 
> things in lowercase); and pyRdfa, and Swignition, and 
> http://developer.search.yahoo.com/help/objectfinder?url=..., and all 
> appear to work in the same way. (Are there any others that support 
> text/html input that I'm missing?)
SPREAD does, but it is not super easy to find or use.  
Tryhttp://htmlwg.mn.aptest.com/rdfa/extract_rdfa.pl?format=n3&type=html&uri=

>
> Given that all these implementations work the same, and it would be 
> very difficult to change them to preserve attribute name case (because 
> they could no longer use a standard text/html parser), it seems to me 
> that the specification must specify this behaviour, so that all the 
> RDFa-in-text/html processors can extract the same triples from the 
> same documents and so that they can all conform to the spec.
>
> Am I missing something here?
No, I don't think you are missing something.  After looking at this a 
bit, I think it might make sense to indicate that in RDFa in HTML prefix 
names are case-insensitive.  My implementation currently does not work 
this way, but it would be a relatively easy change.  Obviously this is 
something we would need consensus on in the community.  As Julian 
rightly points out in another mail, it is possible that someone writing 
a tag-soup based parser would not have this problem and would be opposed 
to "dumbing down" the HTML profile to accomodate the HTML DOM.  But I am 
inclined to agree that you (and others) are correct - that HTML element 
and attribute names are inherently case-insensitive, and so a profile 
for HTML needs to take this into account.

[1] http://www.w3.org/TR/DOM-Level-2-HTML/html.html#ID-5353782642

-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com

Received on Sunday, 24 May 2009 17:58:57 UTC