Atom for RDF transfer? from Danny Ayers on 2005-07-09 (semantic-web@w3.org from July 2005)

From: Danny Ayers <danny.ayers@gmail.com>
Date: Sat, 9 Jul 2005 12:18:15 +0200
To: atom-syntax@imc.org, semantic-web@w3.org
Message-ID: <1f2ed5cd05070903184b721c4e@mail.gmail.com>
While everyone's waiting for ratification of the Atom format, maybe
there are a few brain-cycles that can be harnessed...

What I (and others [1]) am looking for is a standard means of
interfacing with an RDF store like Redland or Jena over HTTP. The
operations required are:

1. query the store
2. add statements to the store
3. delete statements from the store
4. make an update to the store
(5. sync stores)

The first of these is reasonably well provided for already with
SPARQL. There appears to be some commonality of aproaches to 2, 3 and
4, but no single standard seems yet to have emerged. 5. is something
needed sometime before long, but could probably be fulfilled with the
other points and an appropriate algorithm (possible approaches at
[4],[5]).

Now ideally all the operations would be done directly over HTTP, but
there are one or two issues, and I'm wondering if the Atom
format/protocol could form a consistent, easily implemented
lightweight delivery mechanism as an alternative. It'd be tunneling,
but without all the WS-* overhead. The interchange format specified
for RDF is XML (i.e. RDF/XML) so that's the first hurdle hopped.

This kind of thing is being covered to some extent by the W3C Data
Access WG (DAWG) and Semantic Web Best Practices and Deployment WG,
but they are tied to their charters. A good solution from left-field
(built on good standards) could fast-forward development.

So first the status quo, as far as I'm aware:

1. ask the store a query
The SPARQL protocol and RDF query language [2] is emerging as the
standard for queries, i.e. read-only operations. The query language
itself is fairly SQL-like. The protocol as it currently stands has a
generic WSDL 2.0 expression, but in practice *the* binding so far is
to HTTP, using the query itself as a parameter in a GET:

GET http://example.com/sparqlendpoint?query=...bunch of sparql...

The results are returned as a XML doc in the response body, the format
of which will depend on the nature of the query (there's a simple
result-set format for SELECTs, RDF/XML for CONSTRUCT etc).

The operations of 2, 3 and 4 can be covered by a protocol in a similar
fashion: by supplying an RDF graph to add, a graph to delete, or a
combined operation for update supplying a graph to delete followed by
a graph to add. I'll just expand that a little before describing
existing protocol support -

2. add statements to the store
Data can be added to an RDF store by supplying a list of statements,
that is the graph, as an RDF/XML doc. This is something all
triplestores should support.

3. delete statements from the store
This is a little trickier, there isn't any operation common to stores
that says delete(graph). In practice this may mean listing and
deleting the individual statements. I think there may be issues where
the graph to delete matches a subgraph in the store where the nodes
aren't sufficiently bound to URIs to make matching unambiguous.
Frankly I'm not sure, but I don't think it would impact on the
protocol.

4. make an update to the store
A two phase operation, delete(graphA), add(graphB). It would be nice
for this to happen as an atomic transaction.

Ok, existing protocols/proposals include the NetAPI [6,7], and Joseki:
RDF WebAPI [8] (I think this is currently moving over to SPARQL for
queries, not sure about updates).

The NetAPI use a two-part mime/multipart message with a HTTP POST, the
first containing the graph to delete, the second containing the graph
to add. Adding and deleting alone are special cases. (I seem to
remember there being some issues relating to the use of mime/multipart
around Atom, I can't for the life of me remember what they were).

There's also URIQA, which can provide the operations listed, but is
more aligned towards working with authoritative sources, i.e. where
the host "owns" the resources identified. Technically it looks good,
but involves the addition of extra HTTP methods, which has caused some
controversy.

So how might all this be done in Atom? I don't really know, beyond
thinking perhaps that many of the interfacing operations with a
triplestore may be exressed nicely as a sequence of entries, content
as graphs, each entry representing an add/delete operation.

Cheers,
Danny.

[1] http://lists.gnomehack.com/pipermail/redland-dev/2005-July/001019.html
[2] http://www.w3.org/TR/rdf-sparql-query/
[3] http://www.w3.org/TR/rdf-sparql-protocol/
[4] http://www.w3.org/DesignIssues/Diff
[5] http://www.dbin.org/
[6] http://www.w3.org/Submission/2003/SUBM-rdf-netapi-20031002/
[7] http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/tutorial/netapi.html
[8] http://www.joseki.org/protocol.html
[9] http://sw.nokia.com/uriqa/URIQA.html

-- 

http://dannyayers.com
Received on Saturday, 9 July 2005 10:18:21 UTC