Thoughts on automatic RDF generation for DSpace

[These are some comments on an automatic RDF generation mechanism that 
I've been working on for DSpace. Mick and I thought these ideas might be 
of wider interest.

The generation demo is here:

http://www.dspace.org/dspace-demo/test/testrdf.jsp

To see the RDF without HTML quoting, use:

http://www.dspace.org/dspace-demo/test/testrdf.jsp?noquote=true

]

Now that I've done this first bit (which is built on Jena), I have all 
sorts of questions/problems/issues:

* The automatic generation really doesn't address the idea of object 
identity. The identifiers you see in the demo, like

http://www.dspace.org/org.dspace.db.generated.Publication/1094533159

just use the object's hash code, which is transient. I think a 
Publication wants to be mapped to its persistent id (perhaps its 
handle), but this is outside the Bean spec, which really only addresses 
in-memory objects.

* The generated RDF only captures the field values at a particular 
moment. What happens when we capture an RDF dump of a publication at 
Submit time, and another one when it is modified?

There are also some more mundane features that would be nice:

* There should be some custom field mapping, which says which fields are 
ignored, and provides alternate labels for arcs.

* The various Java collection objects (Object[], Hashtable, Vector, 
Enumeration in JDK 1.1; Collections and Iterators in JDK 1.2 and up) 
should be automatically recursively processed.

* When the value of a property is itself an Object, that object should 
(probably) be recursively processed.

And for future directions, thoughts about the implications of SQL-based 
persistence and RDF querying are welcome. One possible direction I see 
for RDF querying (I'm not sure if it has been thought of already) is to 
provide a query capability which uses XML Schema. So, if a Literal value 
had type xsd:string, then operators like 'starts_with', 'contains', and 
the like could be applied to it. These operations could be mapped back 
to the native facilities of the storage component (in this case, SQL 
operations like 'foo%', '%foo%', and so forth).

Peter

Received on Tuesday, 12 June 2001 09:23:54 UTC