- From: Dan Brickley <danbri@danbri.org>
- Date: Tue, 18 Nov 2008 10:08:17 +0100
- To: Semantic Web <semantic-web@w3.org>
- Cc: RDFa <public-rdf-in-xhtml-tf@w3.org>
Hi all (but especially students and academic staff), Yesterday I found a bug in Redland's librdfa-based RDFa parsing facilities. A fairly obscure markup pattern caused the librdfa library to fail to generate an RDF triple. Redland/raptor deals with this by throwing a fatal error, bringing my RDFa-parsing ambitions to a grinding halt. This was on input data I'd generated myself (the curious can see details at http://bugs.librdf.org/mantis/view.php?id=289 ). If RDF (and especially RDFa) parsers are going to be robustly handle all the scary messy markup that's out there, then I don't think we can wait for humans like me to stumble upon the awkward corner cases that trip them up. So I've a proposal (based on some old work by Janne Saarela): I'd like to see an auto-generated repository of RDFa samples, most (but not all) of which are decent wellformed XHTML with RDFa, but also with a good number of poorly-marked up files. Note that poor, confusing or downright weird markup may or may not trip up XML's wellformedness rules. Here is an old set of RDF/XML test files autogenerated with Prolog: http://www.w3.org/RDF/Test/Janne/ Related tools include the Dada Engine, http://dev.null.org/dadaengine/ (the tool behind http://www.elsewhere.org/pomo/ ) and Rmutt, http://www.schneertz.com/rmutt/ ... either of which could be used to make the output more entertaining. Generating such a test set and then wiring it up to a set of RDFa parsers (via http://rdfa.digitalbazaar.com/rdfa-test-harness/ or something like it) shouldn't be a huge job, but it would be a very useful one. I'd like to see perhaps 1000 'nonsense' RDFa documents that experiment with every conceivable or inconceivable syntactic variant that parsers might encounter in the wild. And then find out (a) if any parsers completely fail with that input (b) what number and content of triples are generated (c) whether the spec gurus agree on what ought to be generated. Does this sound worthwhile? Anyone willing to work on it or to help explore it as a student project? Students would gain an understanding of XML, RDFa grammars and on state of the art (and lack thereof ;) for automatic tool support for assuring compliance with the standards. cheers, Dan -- http://danbri.org/
Received on Tuesday, 18 November 2008 09:09:01 UTC