Implementation Feedback

Hi RDFa(ddicts),

I had a long train ride yesterday and used the time to finally
write the RDFa extractor for ARC I mentioned two weeks ago[1].

Stupid me forgot to copy any test cases (a zip with all files 
might be handy, btw), but I had a version of the latest syntax
doc and just followed the processing instructions in section 5
w/o really thinking too much about why and what WRT the 
individual steps.

Here is my feedback (don't know if you are collecting stuff 
like this already, but maybe it's helpful):

- the instructions were easy to follow, I had some struggles
  with the object literal step, i.e. whether the three options
  (plain/xml/typed) should be processed as a sequence or more
  as "elseif". And the wording between "stripped" content and
  xml could perhaps be made a little more clear. 
   [[
      a string created by concatenating the inner content of 
      each of the child elements in turn.
   ]]
   vs. 
   [[
      a string created from the inner content of the [current element]
   ]]
   is of course exact and correct, but maybe you could add 2 or 3
   words that make it a little more obvious that for any typed literal
   (unless typed as rdf:XMLLiteral) the markup should be removed.

- "converted to an absolute URI using CURIE processing rules" and
  "The result MUST be a syntactically valid IRI" would mean that 
  I'd have to generate an IRI from [_:foo]. That's rather unintuitive
  for people used to turtle or n-triples. I'm creating bnodes from
  them at the moment.
  
- what does the "E" in CURIE stand for? ;)


Today I ran the resulting code against the (very nice) test suite
and noticed that a couple of tests failed, all due to chaining
issues caused by @instanceof. One thing is that the spec is a little
unintuitive, as @instanceof sometimes refers to the subject and 
sometimes to the object, depending on the existence of other attributes.
However, that behaviour is properly encoded in the processing instructions
and shouldn't cause tests to fail. The reason why some tests failed is
that the current spec sets [chaining] to true when @instanceof generates
triples. I think that is a bug, only @rel and @rev should trigger
chaining, e.g. in Test 1001:

[[
<p about="#event1" instanceof="cal:Vevent">
      <b property="cal:summary">Weekend off in Iona</b>: 
]]

With chaining, we get

[[
<#event1> a cal:Vevent .
_:b1 cal:summary "Weekend off in Iona" . 
]]

as the [current object resource] is not set via some
attribute.

After dropping the chaining trigger, ARC passes all tests except
test 0046, but I think that test doesn't follow the spec:

[[
<div rel="foaf:maker" instanceof="foaf:Person">
  <p property="foaf:name">John Doe</p>
</div>
]]

The div's [current element identifier] is a bnode (_:b1),
and so is the [current object resource] (_:b2). The spec 
does not say (I think) that these two bnodes should be 
the same one. According to the processing instructions, 
I then extract
[[
<> foaf:maker _:b2 .
_:b1 a foaf:Person .
_:b2 foaf:name "John Doe" .
]]

I'd say this needs clarification, either in the spec or
in the test case.

Bottom line: implementing an RDFa parser was straightforward,
given the detailed processing instructions. WRT to writing
correct RDFa, I expect the @instanceof shortcut to cause
some confusion.

I used a modified test processor that I hacked together for 
the DAWG tests. It's online[2], feel free to play with it,
if you like. It reads the manifest, grabs the tests from w3.org,
and runs them against an ARC2 SPARQL store. (This means that
there *may* be failed tests which are not detected due to 
a missing feature in the SPARQL engine, but I think it's
fairly complete regarding the queries used by the testsuite.)
If you click "generate report", the script will create a
downloadable EARL report.


Best,
Benji

[1] http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2007Oct/0097.html
[2] http://arc.web-semantics.org/demos/rdfa_tests/


-- 
Benjamin Nowack
bnowack[at]semsol.com

semsol web semantics
Bielefelder Str. 5
40468 Duesseldorf, Germany

fon: +49.211.7316824
fax: +49.211.1587107

http://semsol.com/

Received on Thursday, 25 October 2007 19:39:15 UTC