Re: DBpedia now available as triple pattern fragments from Ruben Verborgh on 2014-10-29 (semantic-web@w3.org from October 2014)

From: Ruben Verborgh <ruben.verborgh@ugent.be>
Date: Wed, 29 Oct 2014 14:58:30 +0100
To: Eric Prud'hommeaux <eric@w3.org>
Cc: semantic-web <semantic-web@w3.org>
Message-Id: <50FB5DF5-0895-4E2B-9833-F77F4BEDF649@ugent.be>

Dear Eric,

> The above doc describes paging over "blank-node-free" triples. Is that to simplify ordering?

The main reason is that blank nodes cannot be identified across different requests.
For instance, if I request the triple pattern fragment for ?x rdf:type foaf:Person
and the results contain _:b0 rdf;:ype foaf:Person,
how can I ask for more data about this person?
I cannot ask for _:b0 ?p ?o,
because the local identifier _:b0 was only valid in the context of the first response.

We solve this by translating all blank nodes to IRIs, following [1].
For instance, the EventMedia dataset contains a blank node _:B0.
The server translates this on the fly to
http://data.linkeddatafragments.org/.well-known/genid/eventmedia/B0.
That URI is dereferenceable and can be reused across requests.

> Would it also work with a deeper ordering like dbooth et al have discussed on semantic-web?

Could you elaborate on this?
Note that the triple pattern fragments interface does not prescribe any ordering;
derived interfaces that support this feature might be defined
(and dynamically discovered by clients).

> These cache effectively move some of your working result set into http-level (and thus sharable, if multiple clients happen to ask overlapping queries) caches.

Yes, this is a very important part of the scalability of our approach.
With SPARQL endpoints, each client sends a unique request to the server.
With triple pattern fragments, clients send multiple simpler requests to the server;
if two queries partially overlap, several fragments can be reused.
This is the reason that caching is indeed more effective.

> Have to poked at what happens when to make the unit of exchange a cannonicalized graph pattern instead of just a triple pattern? (I note that your slides say "go play" so maybe that's up to us.)

That is feature work, for all of us :-)
With the current work, we have looked at the question:
what happens if we choose a radically simple interface on the server?
In future work, people can investigate:
what happens if we make the interface (slightly) more complicated?
Does caching get better if we allow full-text search, graph patterns, …?

I'm very eager to learn about that.
The Linked Data Fragments initiative is about exploring those trade-offs:
triple pattern fragments make high availability and cache reuse easier,
but queries execute slower. We're curious to see other trade-off combinations,
which is why we publish our results and experimental set-up
(https://github.com/LinkedDataFragments/Availability-Performance-Benchmark).

Best,

Ruben

[1] http://www.w3.org/TR/rdf11-concepts/#section-skolemization

Received on Wednesday, 29 October 2014 13:59:04 UTC