RE: ANN: Joseki, an RDF server for Jena from Seaborne, Andy on 2002-04-02 (www-rdf-interest@w3.org from April 2002)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Tue, 2 Apr 2002 14:00:55 +0100
To: "'Mark Baker'" <distobj@acm.org>
Cc: www-rdf-interest@w3.org
Message-ID: <5E13A1874524D411A876006008CD059F038D370E@0-mail-1.hpl.hp.com>
Mark,

> 
> Hi Andy,
> 
> Yup, I'm an HTTP cop. 8-)

More gamekeeper than cop, me thinks :-)

I'll start with the last thing:
> The protocol is the NetAPI, no? 8-)

The idea is to scope out a set of operations on an RDF model (as in a
collection of statements), with models being first class objects (they have
a URI and a location where they reside - and where they can be accessed.
They have a separate existence from the web objects they describe.

The NetAPI is the abstract operations: the actions that can be performed at
some end-point so it is not the protocol.  The protocol would be the
concrete realisation in some technology.  I hope there would be an HTTP/REST
protocol, a SOAP protocol as well as others like JMS.  I'm agnostic as to
the technology used - each will have its place and advantages.  What I am
borrowing from REST is the idea of a few fixed operations and the cacheable
operations.

The NetAPI operations could be realised in HTTP in different ways.  But
something like:

  GetModel(modelURI) -> model statements    =>  GET modelURI
  PutModel(modelURI, statements) ->         =>  PUT modelURI
  Query(modelURI, query) -> results         =>  GET modelURI?query
  Update(modelURI, query) ->                =>  POST modelURI (body has the
updates)

would be natural (to me at least).

PUT for update does not seem right - we are not replacing one model
completely by new data but making a change to part of it.  PUT is "delete
old model; create new model; insert new statements" and acting on the model
as a whole.

For Joseki, the NetAPI is more important at the moment and I wanted to get
something out for discussion.  It has already been suggested that a second
kind of update exists where all the statements with the same subject
(without being able to enumerate them) are first deleted, then statements
are added.

The plan is to do protocol realisation(s) "real soon now".

> 
> > The operations described (update and query) are the abstraction of
> > operations which could be realised in various different 
> ways (SOAP, directly
> > in HTTP operations (c.f. webdav), servlets).
> > 
> > In Joseki, I choose to use HTTP POST and servlets because 
> it is easy to
> > supply a web server add-on to get the thing running.  Run 
> the standalone
> > server from the command line or drop in the WAR file and go.
> 
> You found this easier than just implementing doGet(), doPut(), etc..?
> And you recognize that you've sacrificed caching and bookmarking?
> 
> > I'm thinking of a world where models are 1st class 
> resources and have URIs.
> 
> Me too!
> 
> > There are three categories of RDF operations:
> > 
> > - Operations on models themselves (GET, PUT, DELETE of whole models)
> > - Operations on the statements in model (conditional GET)
> 
> Why conditional?  PUT would be appropriate too, to replace a 
> statement.

That would not be my design choice - I think PUT is for replacing one web
object with another, not changing part of one; that would be a POST
operation.  PUT would be on the model and so I have suggested using for
replacing the whole model.  Inserting/deleting statements in one part of
model is different to me.  Even if statements have URIs then we are
performing a modification operation *on the model*, not the statement (nor
the underlying web object being described for that matter).  And one
statement can have many URIs because statement equality is defined as same
subject/same predicate/same object.
> 
> > - The meta operation of what operations/options are 
> possible on this model.
> > 
> > The correct set of operations on statement in models is by no means
> > finished.  It has been suggested that a "replace" is needed 
> to remove
> > everything about a resource and replace it with new information.

That was unclear.  "Resource" here was referring to the labelled node in an
RDF graph, not the web object about whose metadata that labeled node refers
to.  I am suggesting a "delete all properties with this resource (URI
labelled or bNode) as subject" operation.

With models as first class web objects, the metadata for a web object (know
as a resource as well) isn't necessarily held anywhere near (in URL-space)
the web object itself.

> Wouldn't that be a PUT?
> 
> >  As the
> > operations stand, the app needs to do a query to find all 
> the relevant
> > triples, then do a second operation to delete them and add 
> the new ones.
> > This leave a timing hole and transfers the to-be-deleted 
> statements twice
> > for no reason.  
> 
> Have you considered WebDAV LOCK?

More below on consistency - I would like to avoid the need for a lock
because of the issues of timouts and state mantainence at the server.
However, this may well be unavoidable in practice.

> 
> > In this style, models (collections of statements) are first 
> class resources
> > so meta-data about resources is not necessarily located 
> with the resource.
> > The style where metadata is stored with the resource would 
> also be good -
> > sort of GET-META or PUT-META (maybe by MIME negotiation).
> 
> Well, the model could be encapsulated within an HTTP intermediary such
> that a GET on a resource proxied through that intermediary (where the
> resource was referenced in that model) could return metadata on the
> responding HTTP headers.
> 
> > There is also a choice about granularity - one choice is 
> operations on
> > individual triples (add statement, delete statement) which is the
> > fine-grained approach.  An alternative is to have 
> operations on sets of
> > statements.  I choose the latter because it means the model 
> can go from one
> > consistent view to another consistent view in a single operation.
> > "Consistent" may involve deleting several triples and 
> adding multiple
> > triples.
> 
> It's a trade-off, I suppose.  But my first impression is that this may
> be a case of premature optimization.  Could be wrong though, I don't
> have a lot of information to go by.
> 

The choice of Query/Update is not one of efficiency but consistency.
Suppose a model is a collection of vCards (AKA metadata about a person) and
my app wants to add a new vCard, which is, say 10 statements.  Adding one
statement at a time, leaves the model open to being viewed with half a vCard
in it.  Adding all the statements in one go moves the model from one
consistent state to another, with all the vCard added.  This also solves the
failure conditions of my app loosing connectivity part way through.  An
operation of add all vCard statements should either all happen or all not
happen; the server does not need timeouts if supporting consistency at
levels above the model integrity, such as only complete vCards.

> > An alternative protocol to realise the same abstraction 
> that more directly
> > uses HTTP would be to be to use 
> > 
> > 	"GET modelURI?query"
> > 	"POST modeURI" with adds and deletes in the body
> > 
> > I think PUT is about an operation on the whole model but a 
> "partial PUT" to
> > do replace and "partial DELETE" may be OK -- PUT and DELETE 
> are really about
> > the whole resource at the URI i.e. the whole model.  This 
> is realising the
> > operations in a different way where transport end point 
> (where the operation
> > is performed) is also the name of the object being operated 
> on (404's and
> > all that).  Joseki splits the two concepts.
> >
> > I wasn't sure what the GET in your comment was acting on.  
> Was it a triple
> > (possibly taking the note about triples and URIs a bit too 
> literally).  If
> > so, is the model all the URIs this server hosts? (i.e. 
> there is no explicit
> > concept of a model).
> 
> I would use GET for queries, as you mention above.

That is the plan for an HTTP-direct protocol in a future Joseki system -
there are a couple of practical issues though.

1/ a query request can be large.  It might contain several URIs; some likely
query languages are XML and can be long.  All in all, too long for just the
GET uri?querystring model.  Don't know what caching intermediaries do about
GETs with a body.  Any help me out here?  And how long can a
"URI?queryString" be in practice?

2/ now the location of the model and its name are the same (both the URL).
I haven't said that the model URI is a URL.  Other protocols keep the
name/location (or end point) concern separate; HTTP doesn't.  With mappings
to SOAP/HTTP and to HTTP directly, there are different choices.  In a SOAP
mapping, the end point (the URL where the POST is done) and the model being
operated on (given by URI) are different.  In the HTTP mapping, the URL
serves to identify a model by saying where to perform an operation (at least
to within a virtual host).

> 
> > In Joseki, all triples do get an identity - I reify them to 
> transfer them (I
> > just use the resource for the statement - currently the 
> resource for the
> > reification is a bNode.  I could have chosen to generate a 
> URI (modulo
> > mutterings about overhead).
> > 
> > I know you like REST style architectures - while REST is 
> about a hypermedia
> > system and I am thinking about manipulation of collections 
> of RDF, the
> > infrastructure is common and I have picked some key 
> features from it.  The
> > use of coarse-grained operations to help efficient server 
> deployment; a
> > first attempt to define a fixed set of verbs, with variations in the
> > parameters (e.g. choice of query language and result 
> formats).  It isn't
> > RPC-style in the sense that there would be a few fixed, well known
> > operations, not operations defined by the domain of usage 
> (i.e. application
> > specific operations like "get everything known about 
> such-and-such" - the
> > "everything" is application dependent) which requires a 
> client to reflect to
> > see if it understands the API.
> 
> I understand, you made some calls.  If I were doing it, I'd start with
> giving everything and URI and stick with HTTP and WebDAV 
> methods.  If that
> didn't perform as I wanted, then I'd consider doing what you 
> did.  Just
> IMHO, of course.
> 
> > In this first prototype that realises the RDFNetAPI, I made 
> some pragmatic
> > choices: not modifying  HTTP, simple deployment of packages 
> protocol engine,
> > client-side API in Java for Jena.  Other implementations 
> would be good.  I
> > am thinking about the abstraction of a NetAPI, not so much 
> of the concrete
> > protocol.
> 
> The protocol is the NetAPI, no? 8-)

See the beginning - but this is not recursion!

Good to talk to you,

	Andy

PS And I wish there were a greater real need for caching queries - but there
isn't today - not enough freely available RDF data on the web yet :-(

> 
> MB
> -- 
> Mark Baker, Chief Science Officer, Planetfred, Inc.
> Ottawa, Ontario, CANADA.      mbaker@planetfred.com
> http://www.markbaker.ca   http://www.planetfred.com
Received on Tuesday, 2 April 2002 08:01:57 UTC