Fetching RDF Data Objects

A passing comment else led to a request for me to write up my experiments
with a slightly higher level abstraction than the RDF triple.

To be clear: This is out of scope for the current SOW of the History Store;
it is just a discussion note.

Comments etc most welcome.  Its ongoing work.

	Andy

---------------------------------------------------------

    RDF Data Objects
    ----------------

    An ongoing experiment
    Andy Seaborne
    June 2003


Context
=======

In a networked environment, granularity of operation is important for
practical systems.  Too small a grain size and network overhead is too high;
too large a grain size and the impact of server performance is noticeable.

Background
==========

Previous work "RDF Objects" [1] has looked at describing a local part of the
RDF graph so that a selection RDF statements is the primary abstraction, not
a single triple.  This is a client-side definition of the RDF object.

The TAP system [2] has a single access function GetData that can return any
RDF - in particular the exact form (graph shape, namespaces etc) is not
prescribed by the client access operation but is determined by the server.

In EJB systems, an efficient design avoids very small entity beans because
the database overhead reduces performance, analogous to the network design
issues here.

RDF Data Objects
================

The idea is that the notion of the appropriate RDF to return on a request
isn't just a matter known only to the client.  An example would be getting a
vCard [3] : a query might find the resource for one, but abstraction of the
vCard isn't just a single statement or one level of the RDF graph.

An example RDF/vcard (N3)

<andy>
    vcard:FN   "Andy Seaborne" ;    # Formatted name
    vcard:N                         # Structured name - a bNode with further
properties
       [ vcard:Family "Seaborne" ;
         vcard:Given  "Andy" 
       ] ;
    .

so the definition of a complete vCard is part of the design of the vcard
schema and does not need to be something the client should have to know.
RDF allows optional triples, so the rigid triple patterns of most query
languages, e.g. RDQL, used for locating a graph node, aren't good at
extracting the optional RDF and structured values associated with the node
found.

Observation:
In this case, the extraction algorithm could also be a transitive closure
over bNodes but some RDF schemas (e.g. FOAF) have top level abstraction
which are usually bNodes.  Indeed, typically, all FOAF resources are bNodes
so the closure is everything.

"Fetch"
=======

Joseki has the notion of a repository of RDF.  It can only answer question
about resources based on metadata in the repository - there may be other
places to go to find out things.

A building block operation for the Joseki approach is to have a "fetch"
operation which is a "get me everything you know about <X>" and it is a
server-side decision as to the RDF statements to return.  This is a sort of
simple query and fits with HTTP GET:

    GET http://host/repository?op=fetch&uri=%encodedURI

The choice of algorithm to apply depends on the URI specified.  Doing a
fetch on a vCard gets the properties and the compound structure of the
vcard, specifically, the vcard:N structure as well as the plain vcard:FN
statement.  The client is expected to navigate the subgraph returned and
work out what it wants to do with the information.

Choosing the RDF data object in the server can be based on, say, RDF type
(and hence with OWL, characteristic properties).  Further arguments could
also be useful if a thing is of several types if the RDF gets too big for
practical use.

Reference and Containment
=========================

What is really going on is that there are data objects and two kinds of
link: reference links where one object links to another object and
containment links where one object contains subsidiary portions of the
graph.

If properties were marked as containment or reference links, then a single
algorithm could be used that traversed containment links (cycles need to be
handled).  Making properties either a subPropertyOf :reference or :contains
(or subClass of :ReferenceLink or :ContainmentLink) works but I want also to
handle schemas where this is not designed in.  Hence the associating of
different algorithms in the server.

Experimental Status
===================

My prototyping version of Joseki does bNode closure of RDQL queries and also
provides a fetch operation.  Tying into the Joseki configuration system has
not yet been done.

As of June 2003, these features have been tested with a demo app currently
under development.


[1] http://www.hpl.hp.com/techreports/2002/HPL-2002-315.html
[2] http://tap.stanford.edu/
[3] http://www.w3.org/TR/vcard-rdf

Received on Friday, 13 June 2003 10:53:09 UTC