RE: Hydra use case: Linked Data Fragments (ISSUE-30) from Markus Lanthaler on 2014-03-14 (public-hydra@w3.org from March 2014)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Fri, 14 Mar 2014 12:45:08 +0100
To: <public-hydra@w3.org>
Cc: "'Ruben Verborgh'" <ruben.verborgh@ugent.be>
Message-ID: <012501cf3f7a$da0bf350$8e23d9f0$@lanthaler@gmx.net>
Hi Ruben,

First of all, thanks a lot for the detailed explanation including the
motivations that triggered the creation of LDF.

On Tuesday, March 11, 2014 3:21 PM, Ruben Verborgh wrote:
> We propose a specific type of fragments called
> "basic Linked Data Fragments", and they are created
> so that _clients_ can answer queries in a hypermedia-driven way.
> Each fragment corresponds to a triple pattern
> { ?s ?p ?o }, where components can be constant or variable.
> 
> Here is an example fragment:
> http://data.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-
> owl%3AbirthPlace&object=dbpedia%3ANew_York

So, in principle "basic Linked Data Fragments" are a more sophisticated and
generalized ersion of Luca Matteis' Restpark API
(http://lmatteis.github.io/restpark/), right? Are you aware of Restpark? I
forgot it myself but after reading your mail I had a flashback :-)


> This is the resource of are all people born in New York.
> Of course, the representation doesn't contain all of them,
> just the first 100 out of 5,270; and this is also indicated in the
> fragment itself.
> The HTML version you see in the browser has several hypermedia
> controls:
> - a form that allows you to go to any other { ?s ?p ?o } fragment of
> the dataset
> - links on each triple component that allows you to find related
> fragments
> Note that this is _not_ regular dereferencing:
> the links lead you to other fragments that have the component
> in subject, predicate, or object position.

These aspects are completely missing from Restpark


> Now the challenge for Hydra is of course
> to provide the equivalent of this HTML representation for machines.
> The HTML source code is already marked up with Hydra RDFa,

Cool!


> but for clarity, it might be best to look at the Turtle (or JSON-LD)
> representation.
> You can access it through content negotiation, or here's a direct
> Turtle link:
> http://data-
> cdn.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-
> owl%3AbirthPlace&object=dbpedia%3ANew_York
> Here is the relevant piece of the representation:
> 
>     :dbpedia void:subset <http://data-
> cdn.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-
> owl%3AbirthPlace&object=dbpedia%3ANew_York>;
>         hydra:search _:triplePattern.

This all makes perfect sense to me.. the only thing that you might wanna
change (not sure) is to what hydra:search is attached to.  In this case
here, I (as a client) would assume that you further query that Linked Data
Fragment (instead of querying the whole DBpedia dataset). In the HTML
representation you excplicitely say 

  Data source dbpedia
  Query dbpedia by triple pattern

Maybe you should make that explicit for machines as well!?


>     _:triplePattern hydra:template "http://data-
> cdn.linkeddatafragments.org/dbpedia{?subject,predicate,object}";
>         hydra:mapping _:subject, _:predicate, _:object.
>     _:subject hydra:variable "subject";
>         hydra:property rdf:subject.
>     _:predicate hydra:variable "predicate";
>         hydra:property rdf:predicate.
>     _:object hydra:variable "object";
>         hydra:property rdf:object.
>     <http://data-
> cdn.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-
> owl%3AbirthPlace&object=dbpedia%3ANew_York> a hydra:Collection,
> hydra:PagedCollection;
>         dcterms:description "Basic Linked Data Fragment of the
> 'dbpedia' dataset containing triples matching the pattern { ?s
> <http://dbpedia.org/ontology/birthPlace>
> <http://dbpedia.org/resource/New_York> }."@en;
>         hydra:entrypoint :dbpedia;
>         hydra:totalItems "5270"^^xsd:integer.
> 
> So we use a URI template to explain that
> the ?subject parameter corresponds to rdf:subject,
> and the same for the other triple pattern components.
> 
> This already works; you can see the client in action here:
> http://client.linkeddatafragments.org/
> The source code for hypermedia-driven behavior is here:
> https://github.com/LinkedDataFragments/Client/blob/85b67be35/lib/Linked
> DataFragmentsClient.js#L33
> The source code for Hydra interpretation is here:
> https://github.com/LinkedDataFragments/Client/blob/85b67be35/lib/Linked
> DataFragmentTurtleParser.js#L48

Really cool stuff. I see a lot of potential for this. It can be used to add
extremely sophisticated querying to Hydra-powered Web APIs without
(over)burdening the server as most other solutions do.


> Now of course, this client is a specifically implemented one;
> I might as well have used a proprietary vocabulary
> or even a totally different mechanism (RPC / out-of-band).

... right. That's what Restpark did for example. 


> The reason I chose to use Hydra is because
> it should enable *other* clients to use fragments as well,
> without any prior knowledge of how the server works.

+1 IMO this is exactly what makes this so powerful

 
> However, several pieces of necessary knowledge
> are still hard-coded in my client, and I want to get them out.
> In other words: these are things I think Hydra should describe,
> so *any* client can figure out the API on its own (self-
> descriptiveness).
> Below are some issues we currently experience.
> 
> 1) How should a parameter be serialized in the URI template?
> Hydra does tell me that "object" corresponds to the rdf:object property,
> but not how I can add that subject to the template. Here are possible
> variations:
[...]
> All of these work on the server, except f).
> The others point to three _different_ resources:
> a), b), c) point to "triples with pattern {?s ?o
> <http://dbpedia.org/resource/New_York>. }"
> whereas d) points to "triples with pattern {?s ?o "New York"@en. }"
> and e) to "triples with pattern {?s ?o "New York". }"
> Here are the possibilities:
>   a) passing the full URI as a string
>   b) passing the full URI as a string, surrounded by angular brackets
>   c) passing the URI abbreviated with a prefix (which prefixes are
> recognized?)
>   d) passing a string literal, surrounded by double quotes and followed
> by a language suffix (NOT Turtle-encoded)
>   e) passing a string literal, surrounded by double quotes (NOT Turtle-
> encoded)
>   f) passing a string literal
> How can I explain to clients which ones it can use,
> which ones are the same and which ones are different?

Very good question. This is tracked as ISSUE-30, right?

  https://github.com/HydraCG/Specifications/issues/30

IMO there are basically two options. We can either define (and fix) how
IRIs/literals are to be serialized or we add a mechanism to describe how
they should be serialized. It's the typical engineering trade-off between
flexibility and simplicity. Fixing the serialization format is much simpler
and reduces variability. Allowing to describe the expected serialization
format is much more flexible but makes the implementation of (primarily)
clients more difficult.


> And of course, there would be many more ways to parse parameters.
> I could live with only giving one that works for clients,
> but it should be consistent and allow to differentiate between strings
> and URIs.

Would be your preference or can you "just live with it"? Do you think there
are many cases where a variable can take both an IRI and a literal and the
distinction is important? I kind of have troubles to find an example where
that would matter... but I do see how it makes the implementation of
(SPARQL-based) servers more difficult if they have to infer what was passed
themselves (or query for both the IRI and literal).

 
> 2) What do the subject, predicate, and object properties really mean?
> Hydra tells me that they correspond to rdf:{subject,predicate,object}
> and that's true,
> but not sufficient. I want the client to be able to interpret that
> "if you follow this template, then you will get a fragment with those
> triples
>  that match all of the components you've specified".
> Those semantics are currently not explicit.

My take on this would be to either specialize the IriTemplate class to
something like a LdfIriTemplate or to specialize hydra:search.. something
lik ldf:queryInterface. You could then even go as far as saying

  ldf:queryInterface a hydra:TemplatedLink ;
    supportedOperation [
      a ldf:RetrieveBasicLdfOperation ;
      hydra:method "GET"
      hydra:returns ldf:BasicLdf
    ] .

(sorry, haven't looked up LDF vocabulary yet)


> I'm very interested to learn from your feedback
> and open to discuss about anything,
> in particular the above open issues.

I'm pretty excited about this as I really see a lot of potential. It would
be interesting to see if a Hydra ApiDocumentation would provide enough
information to dynamically "crawl" the data instead of querying it by SPO.
Have you spent any thoughts on that already?

Great work!


--
Markus Lanthaler
@markuslanthaler
Received on Friday, 14 March 2014 11:45:42 UTC