Implementing an RDFa parser in python

Hi everyone,

[The following comments are my personal comments on the RDFa Primer.
They do not represent my employer's views or the views of any groups
with which I am associated, including the RDF Data Access Working Group.]

I decided to implement an RDFa parser [1] for my own understanding of
the spec and hopefully with this work have comments to give on the
current syntax document [2]. As part of my implementation, I have
created a small test suite [3] from the examples found in your document.
Hopefully this can serve as the basis of the task force's test suite in
the near future.

Regards,

Elias

Comments/Questions:

Section 2.3

"All [RDF URI references] are subject to xml:base  [XML-BASE]. Note that
this means that in the absence of an xml:base attribute, the document
containing the RDF statements is itself the base."

comes with an example:

<span xml:base="http://internet-apps.blogspot.com/">
    <link about="" rel="dc:creator"
href="http://www.blogger.com/profile/1109404" />
    <meta property="dc:title" content="Internet Applications" />
</span>

that yields the following RDF statements:

<http://internet-apps.blogspot.com/>
   dc:creator
   <http://www.blogger.com/profile/1109404> .
<http://internet-apps.blogspot.com/>
   dc:title
   "Internet Applications" .

I'm a bit confused here, because my implementation yields:

<http://internet-apps.blogspot.com/>
   dc:creator
   <http://www.blogger.com/profile/1109404> .
_:span0
   dc:title
   "Internet Applications" .

Notice the blank node as opposed to the URI reference equivalent to the
xml:base element. The spec says that the document is itself the base,
however, meta and link subject resolution dictate that in the absence of
 about and xml:id, one creates a new blank node. It doesn't say that I
need to treat is an empty string and do URI resolution as in xml:base.
What are your thoughts?

Section 3.3

In the example you use two prefixes not previously mentioned: bilio and
taxo.

I used for biblio: http://example.org/biblio/0.1
and for taxo: http://purl.org/rss/1.0/modules/taxonomy/

Also, the triples output does not set the dc:title triples as XMLLiteral
  literal types.

Section 4.2.4

The XHTML example contains a typo, s/foaf:knowns/foaf:knows

Section 5.1.1.1

I understand how the most relevant triple is the one containing XML
mark-up, but just wanted to let you know that maybe you should add the
dc:creator triple as well to avoid confusion.

Also, can you go in more details what is meant by exclusive
canonicalization of the RDFa element's value.

Section 5.1.2

I could not find the statement that dicated the datatype allowed for the
datatype attribute (i.e. CURIEs, URI Ref, both). Also, It took a lot of
reading to find that "plaintext" is a special value of datatype.

Section 5.2

_:a foaf:mbox mailto:daniel.brickley@bristol.ac.uk .
_:b foaf:mbox mailto:libby.miller@bristol.ac.uk .
_:a foaf:knows _:b .

mailto links, need <>.

Section 6.1

In the yielded triples section, you are missing a couple of "." ending
the statements.

Section 6.2

The examples are missing the geo declaration. I used
http://www.w3.org/2003/01/geo/

Also, some of the triples containing literals are missing
^^rdf:XMLLiteral. I'm not sure what you meant for geo:lat, geo:long,
dc:title, foaf:name.

In general, I'd much rather use Turtle (which I did for my test suite)
than using NTriples. The variations you made to NTriples pretty much
make it Turtle. You might have picked NTriples because you have a
normative reference to it, but that didn't stop SPARQL Query Language
from using it. Just a thought.

Anyways, my parser supports almost everything from the spec with the
exception of the partial reification and the flattening of XMLLiterals
by concatenating the child text nodes. I'll be waiting for Mark's
XHTML1x spec so I can update the parser for it. Looking forward to that
and other language implementations.

That's all for now.

-Elias Torres
http://torrez.us/

[1] http://dev.torrez.us/public/2006/rdfa/python/rdfa.py
[2] http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-syntax
[3] http://dev.torrez.us/public/2006/rdfa/tests/

Received on Saturday, 3 June 2006 18:43:45 UTC