Re: RDFa in HTML 5 from Philip Taylor on 2009-05-22 (public-html@w3.org from May 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Fri, 22 May 2009 16:21:01 +0100
To: Shane McCarron <shane@aptest.com>
CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A16C2DD.5000107@cam.ac.uk>
As an attempt to clarify my current views:

I'm working from the basis that for any arbitrary stream of bytes served 
as text/html, it should be possible to determine the set of triples that 
are extracted when you apply the standard RDFa triple-extraction 
algorithm to the document, and determine whether the document is valid. 
That behaviour should be well-defined (including for invalid inputs) and 
well tested and should (eventually) match implementations, and ideally 
it should be easy to implement and should match authors' expectations 
and should be similar to the XHTML syntax and so on.

As an example of some arbitrary inputs, I've made 
http://philip.html5.org/demos/rdfa/tests.html [currently very 
experimental; only tested in Firefox 3.0 and Opera 9.6, has too little 
documentation and too many bugs, etc] to illustrate various cases that 
might be interesting. That also shows that current implementations are 
quite varied in how they handle the inputs, and so presumably the 
implicit mapping from text/html to RDFa-in-XHTML is not obvious enough 
by itself to ensure interoperable implementations.

Given that basis, I can't see a sensible way to solve the problem 
without relying on HTML 5 to define the mapping from an arbitrary stream 
of bytes to a DOM (because practical text/html parsing isn't defined 
anywhere else), and then defining the RDFa processing on top of that 
(perhaps via an explicit mapping onto another RDFa spec if that's possible).

My document was a rough attempt to show how I imagined it could be 
defined in a way that would give clear answers to all the test cases 
above, by building on top of HTML 5, as an alternative approach to what 
I saw in http://www3.aptest.com/standards/rdfa-html/ and 
http://www.w3.org/TR/rdfa-syntax/

I don't intend this to be a competing specification - fragmentation 
would certainly be bad, and (in the long term) everything should be 
consistent and integrated and clear and it should all be defined in 
official RDFa specifications. (I don't think I have the time or 
motivation or skill to write a proper specification for this anyway, so 
I'm more than happy to let other people do the work!)

It may have been a bad idea to make the document look like a spec, but 
I'm not sure of a better way to express what I imagine a solution could 
look like.

Responding to some specific points:

Shane McCarron wrote:
> I'm sorry that my draft "profile" document doesn't answer your 
> questions.   Of course my intent is to evolve that profile so that, in 
> conjunction with the other RFCs, Candidate Recommendations, and 
> Recommendations it normatively references, it represents a thorough 
> description of the model for embedding RDFa in HTML documents.

That sounds like the best approach to the problem. My criticism of your 
published document is coming from an understanding that it's an early 
draft and doesn't claim to be perfect and there's plenty of opportunity 
for any problems to be solved in the future. (My intent is for the 
criticism to be constructive, not rude - apologies if it's too much of 
the latter!)

http://lists.w3.org/Archives/Public/public-html/2009May/0127.html 
highlighted some specific issues, but I didn't see how they could be 
resolved by localised changes to your existing document, which is why I 
wanted to look at a more radical way of trying to resolve those issues. 
My way certainly isn't the best way, but I hope it can be used as a 
piece of feedback that will lead to a better solution in the end.

> If there are 
> things in the CURIE spec that need clarification, then that is the place 
> to fix those.

Sure - perhaps my document should have said "I think the CURIE spec 
should be clarified by changing it to say something more like: ...". 
That would still have been missing the reasons why I think it should be 
changed: the reasons are basically that for some of the examples in 
http://philip.html5.org/demos/rdfa/tests.html I don't see what the 
RDFa/CURIE specs say the output should be (mainly in terms of handling 
errors), but I don't have an exhaustive list of cases. (Would such a 
list be useful?)

> Personally, I would rather have a quality test suite that exercises the 
> specification and ensure that suite gets extended to clarify any edge 
> cases that implementors are curious about.

I would agree that's the best way to ensure the quality of 
implementations - it'd be great if the tests I linked above could 
perhaps become useful as part of that.

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Friday, 22 May 2009 15:21:39 UTC