Re: HTML5 RDFa Test Suite (was Re: Request to publish HTML+RDFa (draft 3) as FPWD) from Philip Taylor on 2009-09-22 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Tue, 22 Sep 2009 18:05:11 +0100
To: Manu Sporny <msporny@digitalbazaar.com>
CC: HTMLWG WG <public-html@w3.org>
Message-ID: <4AB903C7.5060207@cam.ac.uk>

Manu Sporny wrote:
> (bcc: RDFa Task Force Mailing List)
> 
> Sam Ruby wrote:
>> My conclusion is that defining RDFa in HTML in terms of a DOM or an
>> Infoset are but two of the possible ways of achieving the desired
>> result, namely being precise as to what triples MUST be produced from a
>> given input.
> 
> Those reading this thread should also keep in mind that we not only have
> a spec to describe the processing model, but we also have a large,
> modular test suite that exercises every feature, as well as many
> potential error conditions, for RDFa processors:
> 
> http://rdfa.digitalbazaar.com/test-suite/
> 
> This test suite was upgraded this past weekend to cover HTML5 and
> contains 127 unit tests specifically for HTML5. There are also at least
> three implementations that are close to 100% conformant with HTML+RDFa
> (the PyRDFa processor, the MarkLogic processor, and the librdfa processor).

There's another test suite (incomplete and buggy and slightly outdated 
(particularly the rdfQuery results) but it demonstrates the relevant 
issues) at <http://philip.html5.org/demos/rdfa/results.html>.

It only tests a few features (mainly namespace and lang processing), but 
I assume it is testing many more potential error conditions for those 
features, since no implementation has close to 100% agreement with the 
expected output.

For example, the latest version of pyRdfa (2.3.5) generates triples in 
all the following cases, which I believe (given recent discussions on 
the RDFa list) are considered illegal (because they violate the grammar 
of CURIEs, or the grammar or constraints of Namespaces in XML) and 
therefore must not generate triples:

     <p xmlns:0="http://example.org/" property="0:test">Test</p>
     <p xmlns:ex="" property="ex:http://example.com/test">Test</p>
     <p xmlns:xml="http://example.org/" property="xml:test">Test</p>
     <p xmlns:ex="http://www.w3.org/XML/1998/namespace" 
property="ex:test">Test</p>

It also throws a runtime error (due to an undefined variable in 
(presumably untested) error-handling code, despite RDFa not saying this 
is an error at all) in the following case:

     <p xmlns:_="http://example.org/" property="_:test">Test</p>

It also generates triples in the following cases that use undeclared 
prefixes, violating RDFa's statement that the processing model has "An 
initially empty list of [URI mapping]s":

     <p property="rdf:test">Test</p>
     <p property="rdfs:test">Test</p>

The first four cases are relevant only to HTML, because they're not 
well-formed XML, but the other cases are relevant to XHTML too. The last 
two cases seem to be pretty clear in the spec and are just a common 
implementation bug (presumably not tested by the official test suite, 
else the bug should have been fixed already).

These kinds of error handling are a large part of the concern over the 
precision and clarity of RDFa's / RDFa+HTML's processing model. The 
common straightforward examples of RDFa usage are quite obvious and not 
an interoperability problem, but I think these errors are important 
cases for the spec-precision discussion, and in these cases 
implementations appear to widely disagree with each other and with the 
apparent intentions of the spec.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Tuesday, 22 September 2009 17:50:20 UTC