RE: Fetching RDF Data Objects from Seaborne, Andy on 2003-07-14 (www-rdf-dspace@w3.org from July 2003)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Mon, 14 Jul 2003 14:10:47 +0100
To: "'www-rdf-dspace@w3.org'" <www-rdf-dspace@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F0760566E@0-mail-1.hpl.hp.com>
I have put up a public demo Joseki server, which runs the latest codebase
from SourceForge.  It supports the notion of getting RDF Data Objects.  The
demo server has two databases, one small one of books and one of airport
codes.

There is a set of sample scripts available at:
http://jena.hpl.hp.com/~afs/2003-07-14-joseki-examples.zip

This includes some for cwm, using log:semantics to get RDF data then further
extracting information locally.

The full server release is:
http://prdownloads.sourceforge.net/joseki/joseki-2.0.0-beta1.zip?download

and the web site http://www.joseki.org has been updated.




Example
=======
Fetching an data object requires the URL for a model, and the resource of
interest.  The server will return the RDF subgraph relating to that
resource.  e.g.

Model:    http://jena.hpl.hp.com:2020/books (you can GET this as well)
Resource: http://example.org/book/book3

and the request is (this works in a browser)

http://jena.hpl.hp.com:2020/books?lang=fetch&r=http://example.org/book/book3

Packaging that into a script that uses wget:
--------------------
#!/bin/bash

# Joseki example: get a published RDF model
# Fetch book3 and related information

# Model where the RDF resides ....
MODEL="http://jena.hpl.hp.com:2020/books"

# Resource of interest ....
# Need to %-encode if it uses, say, frag ids
OBJ="http://example.org/book/book3"

# Construct the request
URI="${MODEL}?lang=fetch&r=${OBJ}"

# Choose transfer syntax
FORMAT="application/n3"

# Make request:
wget -q -O - --header "Accept: $FORMAT" "$URI"
--------------------

	Andy


-----Original Message-----
From: Seaborne, Andy [mailto:Andy_Seaborne@hplb.hpl.hp.com] 
Sent: 13 June 2003 15:53
To: 'www-rdf-dspace@w3.org'
Subject: Fetching RDF Data Objects



A passing comment else led to a request for me to write up my experiments
with a slightly higher level abstraction than the RDF triple.

To be clear: This is out of scope for the current SOW of the History Store;
it is just a discussion note.

Comments etc most welcome.  Its ongoing work.

	Andy

---------------------------------------------------------

    RDF Data Objects
    ----------------

    An ongoing experiment
    Andy Seaborne
    June 2003


Context
=======

In a networked environment, granularity of operation is important for
practical systems.  Too small a grain size and network overhead is too high;
too large a grain size and the impact of server performance is noticeable.

Background
==========

Previous work "RDF Objects" [1] has looked at describing a local part of the
RDF graph so that a selection RDF statements is the primary abstraction, not
a single triple.  This is a client-side definition of the RDF object.

The TAP system [2] has a single access function GetData that can return any
RDF - in particular the exact form (graph shape, namespaces etc) is not
prescribed by the client access operation but is determined by the server.

In EJB systems, an efficient design avoids very small entity beans because
the database overhead reduces performance, analogous to the network design
issues here.

RDF Data Objects
================

The idea is that the notion of the appropriate RDF to return on a request
isn't just a matter known only to the client.  An example would be getting a
vCard [3] : a query might find the resource for one, but abstraction of the
vCard isn't just a single statement or one level of the RDF graph.

An example RDF/vcard (N3)

<andy>
    vcard:FN   "Andy Seaborne" ;    # Formatted name
    vcard:N                         # Structured name - a bNode with further
properties
       [ vcard:Family "Seaborne" ;
         vcard:Given  "Andy" 
       ] ;
    .

so the definition of a complete vCard is part of the design of the vcard
schema and does not need to be something the client should have to know.
RDF allows optional triples, so the rigid triple patterns of most query
languages, e.g. RDQL, used for locating a graph node, aren't good at
extracting the optional RDF and structured values associated with the node
found.

Observation:
In this case, the extraction algorithm could also be a transitive closure
over bNodes but some RDF schemas (e.g. FOAF) have top level abstraction
which are usually bNodes.  Indeed, typically, all FOAF resources are bNodes
so the closure is everything.

"Fetch"
=======

Joseki has the notion of a repository of RDF.  It can only answer question
about resources based on metadata in the repository - there may be other
places to go to find out things.

A building block operation for the Joseki approach is to have a "fetch"
operation which is a "get me everything you know about <X>" and it is a
server-side decision as to the RDF statements to return.  This is a sort of
simple query and fits with HTTP GET:

    GET http://host/repository?op=fetch&uri=%encodedURI

The choice of algorithm to apply depends on the URI specified.  Doing a
fetch on a vCard gets the properties and the compound structure of the
vcard, specifically, the vcard:N structure as well as the plain vcard:FN
statement.  The client is expected to navigate the subgraph returned and
work out what it wants to do with the information.

Choosing the RDF data object in the server can be based on, say, RDF type
(and hence with OWL, characteristic properties).  Further arguments could
also be useful if a thing is of several types if the RDF gets too big for
practical use.

Reference and Containment
=========================

What is really going on is that there are data objects and two kinds of
link: reference links where one object links to another object and
containment links where one object contains subsidiary portions of the
graph.

If properties were marked as containment or reference links, then a single
algorithm could be used that traversed containment links (cycles need to be
handled).  Making properties either a subPropertyOf :reference or :contains
(or subClass of :ReferenceLink or :ContainmentLink) works but I want also to
handle schemas where this is not designed in.  Hence the associating of
different algorithms in the server.

Experimental Status
===================

My prototyping version of Joseki does bNode closure of RDQL queries and also
provides a fetch operation.  Tying into the Joseki configuration system has
not yet been done.

As of June 2003, these features have been tested with a demo app currently
under development.


[1] http://www.hpl.hp.com/techreports/2002/HPL-2002-315.html
[2] http://tap.stanford.edu/
[3] http://www.w3.org/TR/vcard-rdf
Received on Monday, 14 July 2003 09:11:27 UTC