Hydra use case: Linked Data Fragments

Dear all,

It has been mentioned on this mailing list a few times
that use cases would be relevant to help with decisions.
Today, I can finally present our use case
in which Hydra is crucial for client/server interactions.
I will explain what Hydra currently does
and what could still be improved for this application.

We introduced Linked Data Fragments as a collective term
for all kinds of ways to publish Linked Data:
http://linkeddatafragments.org/
Nowadays, the default way to query this is using SPARQL endpoints;
they are services that answer arbitrarily complex queries.
Because some of those queries are difficult,
public SPARQL endpoints have low availability.

We propose a specific type of fragments called
"basic Linked Data Fragments”, and they are created
so that _clients_ can answer queries in a hypermedia-driven way.
Each fragment corresponds to a triple pattern
{ ?s ?p ?o }, where components can be constant or variable.

Here is an example fragment:
http://data.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-owl%3AbirthPlace&object=dbpedia%3ANew_York
This is the resource of are all people born in New York.
Of course, the representation doesn't contain all of them,
just the first 100 out of 5,270; and this is also indicated in the fragment itself.
The HTML version you see in the browser has several hypermedia controls:
- a form that allows you to go to any other { ?s ?p ?o } fragment of the dataset
- links on each triple component that allows you to find related fragments
Note that this is _not_ regular dereferencing:
the links lead you to other fragments that have the component
in subject, predicate, or object position.

Now the challenge for Hydra is of course
to provide the equivalent of this HTML representation for machines.
The HTML source code is already marked up with Hydra RDFa,
but for clarity, it might be best to look at the Turtle (or JSON-LD) representation.
You can access it through content negotiation, or here's a direct Turtle link:
http://data-cdn.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-owl%3AbirthPlace&object=dbpedia%3ANew_York
Here is the relevant piece of the representation:

    :dbpedia void:subset <http://data-cdn.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-owl%3AbirthPlace&object=dbpedia%3ANew_York>;
        hydra:search _:triplePattern.
    _:triplePattern hydra:template "http://data-cdn.linkeddatafragments.org/dbpedia{?subject,predicate,object}";
        hydra:mapping _:subject, _:predicate, _:object.
    _:subject hydra:variable "subject";
        hydra:property rdf:subject.
    _:predicate hydra:variable "predicate";
        hydra:property rdf:predicate.
    _:object hydra:variable "object";
        hydra:property rdf:object.
    <http://data-cdn.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-owl%3AbirthPlace&object=dbpedia%3ANew_York> a hydra:Collection, hydra:PagedCollection;
        dcterms:description "Basic Linked Data Fragment of the 'dbpedia' dataset containing triples matching the pattern { ?s <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/New_York> }."@en;
        hydra:entrypoint :dbpedia;
        hydra:totalItems "5270"^^xsd:integer.

So we use a URI template to explain that
the ?subject parameter corresponds to rdf:subject,
and the same for the other triple pattern components.

This already works; you can see the client in action here:
http://client.linkeddatafragments.org/
The source code for hypermedia-driven behavior is here:
https://github.com/LinkedDataFragments/Client/blob/85b67be35/lib/LinkedDataFragmentsClient.js#L33
The source code for Hydra interpretation is here:
https://github.com/LinkedDataFragments/Client/blob/85b67be35/lib/LinkedDataFragmentTurtleParser.js#L48


Now of course, this client is a specifically implemented one;
I might as well have used a proprietary vocabulary
or even a totally different mechanism (RPC / out-of-band).
The reason I chose to use Hydra is because
it should enable *other* clients to use fragments as well,
without any prior knowledge of how the server works.

However, several pieces of necessary knowledge
are still hard-coded in my client, and I want to get them out.
In other words: these are things I think Hydra should describe,
so *any* client can figure out the API on its own (self-descriptiveness).
Below are some issues we currently experience.

1) How should a parameter be serialized in the URI template?
Hydra does tell me that "object" corresponds to the rdf:object property,
but not how I can add that subject to the template. Here are possible variations:
  a) http://data.linkeddatafragments.org/dbpedia?object=http%3A%2F%2Fdbpedia.org%2Fresource%2FNew_York
  b) http://data.linkeddatafragments.org/dbpedia?object=%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FNew_York%3E
  c) http://data.linkeddatafragments.org/dbpedia?object=dbpedia%3ANew_York
  d) http://data.linkeddatafragments.org/dbpedia?subject=&predicate=&object=%22New+York%22%40en
  e) http://data.linkeddatafragments.org/dbpedia?subject=&predicate=&object=%22New+York%22
  f) http://data.linkeddatafragments.org/dbpedia?object=New+York
All of these work on the server, except f).
The others point to three _different_ resources:
a), b), c) point to “triples with pattern {?s ?o <http://dbpedia.org/resource/New_York>. }”
whereas d) points to “triples with pattern {?s ?o "New York"@en. }”
and e) to “triples with pattern {?s ?o "New York". }”
Here are the possibilities:
  a) passing the full URI as a string
  b) passing the full URI as a string, surrounded by angular brackets
  c) passing the URI abbreviated with a prefix (which prefixes are recognized?)
  d) passing a string literal, surrounded by double quotes and followed by a language suffix (NOT Turtle-encoded)
  e) passing a string literal, surrounded by double quotes (NOT Turtle-encoded)
  f) passing a string literal
How can I explain to clients which ones it can use,
which ones are the same and which ones are different?
And of course, there would be many more ways to parse parameters.
I could live with only giving one that works for clients,
but it should be consistent and allow to differentiate between strings and URIs.

2) What do the subject, predicate, and object properties really mean?
Hydra tells me that they correspond to rdf:{subject,predicate,object} and that's true,
but not sufficient. I want the client to be able to interpret that
"if you follow this template, then you will get a fragment with those triples
 that match all of the components you've specified”.
Those semantics are currently not explicit.


I'm very interested to learn from your feedback
and open to discuss about anything,
in particular the above open issues.

Best,

Ruben

Received on Tuesday, 11 March 2014 14:21:29 UTC