- From: Benjamin Nowack <bnowack@semsol.com>
- Date: Thu, 25 Oct 2007 21:38:53 +0200
- To: public-rdf-in-xhtml-tf@w3.org
Hi RDFa(ddicts), I had a long train ride yesterday and used the time to finally write the RDFa extractor for ARC I mentioned two weeks ago[1]. Stupid me forgot to copy any test cases (a zip with all files might be handy, btw), but I had a version of the latest syntax doc and just followed the processing instructions in section 5 w/o really thinking too much about why and what WRT the individual steps. Here is my feedback (don't know if you are collecting stuff like this already, but maybe it's helpful): - the instructions were easy to follow, I had some struggles with the object literal step, i.e. whether the three options (plain/xml/typed) should be processed as a sequence or more as "elseif". And the wording between "stripped" content and xml could perhaps be made a little more clear. [[ a string created by concatenating the inner content of each of the child elements in turn. ]] vs. [[ a string created from the inner content of the [current element] ]] is of course exact and correct, but maybe you could add 2 or 3 words that make it a little more obvious that for any typed literal (unless typed as rdf:XMLLiteral) the markup should be removed. - "converted to an absolute URI using CURIE processing rules" and "The result MUST be a syntactically valid IRI" would mean that I'd have to generate an IRI from [_:foo]. That's rather unintuitive for people used to turtle or n-triples. I'm creating bnodes from them at the moment. - what does the "E" in CURIE stand for? ;) Today I ran the resulting code against the (very nice) test suite and noticed that a couple of tests failed, all due to chaining issues caused by @instanceof. One thing is that the spec is a little unintuitive, as @instanceof sometimes refers to the subject and sometimes to the object, depending on the existence of other attributes. However, that behaviour is properly encoded in the processing instructions and shouldn't cause tests to fail. The reason why some tests failed is that the current spec sets [chaining] to true when @instanceof generates triples. I think that is a bug, only @rel and @rev should trigger chaining, e.g. in Test 1001: [[ <p about="#event1" instanceof="cal:Vevent"> <b property="cal:summary">Weekend off in Iona</b>: ]] With chaining, we get [[ <#event1> a cal:Vevent . _:b1 cal:summary "Weekend off in Iona" . ]] as the [current object resource] is not set via some attribute. After dropping the chaining trigger, ARC passes all tests except test 0046, but I think that test doesn't follow the spec: [[ <div rel="foaf:maker" instanceof="foaf:Person"> <p property="foaf:name">John Doe</p> </div> ]] The div's [current element identifier] is a bnode (_:b1), and so is the [current object resource] (_:b2). The spec does not say (I think) that these two bnodes should be the same one. According to the processing instructions, I then extract [[ <> foaf:maker _:b2 . _:b1 a foaf:Person . _:b2 foaf:name "John Doe" . ]] I'd say this needs clarification, either in the spec or in the test case. Bottom line: implementing an RDFa parser was straightforward, given the detailed processing instructions. WRT to writing correct RDFa, I expect the @instanceof shortcut to cause some confusion. I used a modified test processor that I hacked together for the DAWG tests. It's online[2], feel free to play with it, if you like. It reads the manifest, grabs the tests from w3.org, and runs them against an ARC2 SPARQL store. (This means that there *may* be failed tests which are not detected due to a missing feature in the SPARQL engine, but I think it's fairly complete regarding the queries used by the testsuite.) If you click "generate report", the script will create a downloadable EARL report. Best, Benji [1] http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2007Oct/0097.html [2] http://arc.web-semantics.org/demos/rdfa_tests/ -- Benjamin Nowack bnowack[at]semsol.com semsol web semantics Bielefelder Str. 5 40468 Duesseldorf, Germany fon: +49.211.7316824 fax: +49.211.1587107 http://semsol.com/
Received on Thursday, 25 October 2007 19:39:15 UTC