RE: Hydra use case: Linked Data Fragments (ISSUE-30)

On Friday, March 14, 2014 2:26 PM, Ruben Verborgh wrote:
> Basic Linked Data Fragments share the URI template of Restpark.
> I actually had a rather similar experience as you;
> read about them and forgot until Luca pinged me.

:-)


> However, whereas Restpark still has "query" in its terminology
> (for instance, there is a limit parameter);
> basic LDFs are really just specific fragment of a dataset
> that can (and should) be interpreted separately from their application.
> And that's where Hydra comes in:
> my client might use fragments to solve SPARQL queries,
> but other clients might do something completely different.


+1

 
> >>    :dbpedia void:subset <http://data-
> >> cdn.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-
> >> owl%3AbirthPlace&object=dbpedia%3ANew_York>;
> >>        hydra:search _:triplePattern.
> >
> > This all makes perfect sense to me.. the only thing that you might wanna
> > change (not sure) is to what hydra:search is attached to.  In this case
> > here, I (as a client) would assume that you further query that Linked
Data
> > Fragment (instead of querying the whole DBpedia dataset).
> 
> The above are two distinct triples, right? So I'm saying that:
> 
> >> :dbpedia void:subset <....>.
> >> :dbpedia hydra:search _:triplePattern.

Sorry, you are of course right. I've misread those two triples. I read it as

  <....> hydra:search _:triplePattern


> So this does capture the semantics that the whole dataset is searched?
> I.e., would the client know that the query searches DBpedia, not the
> fragment?

Yes, my bad.


> >> 1) How should a parameter be serialized in the URI template?
[...]
> > We can either define (and fix) how
> > IRIs/literals are to be serialized or we add a mechanism to describe
> > how they should be serialized.
> 
> That's it.
> But. the full flexibility that this use case needs
> will probably be overkill for many use cases.
> So I'm afraid there will have to be a mechanism,
> because few would want to go all-the-way.
> 
> As I've shown, for this use case it's crucial to distinguish
> beween literals and URIs. It's a no-go to do anything else.
> But it would probably be unreasonable to expect
> that people will want to indicate this difference all the time.
> (For instance, always have < > around URIs or "" around strings.)
> 
> > Allowing to describe the expected serialization
> > format is much more flexible but makes the implementation of
> > (primarily) clients more difficult.
> 
> What could work is "convention over configuration".
> (But still allowing configuration.)

Yeah.. even though that's always a bit odd with the open world assumption.
But it is just a hint anyway.. so it should probably be good enough.


> >> And of course, there would be many more ways to parse parameters.
> >> I could live with only giving one that works for clients,
> >> but it should be consistent and allow to differentiate between
> >> strings and URIs.
> >
> > Would be your preference or can you "just live with it"?
> 
> What I mean is that:
> the server currently supports different ways to pass a URI.
> You could abbreviate it with prefixes, or have to full URI in < >.
> It would be totally fine with me if Hydra were only able
> to explain just one of them, and not both.

OK, yeah.. I think allowing prefixes makes the solution much complex without
bringing much advantages if we talk about machine clients. If a human user
is interacting such a service, the UI can still support CURIEs but expand
them before sending the URL to the server. Have you considered doing that in
your prototype? A positive side effect of doing so would be increased
cache-hit rates.


> But it would need to explain one of them.
> 
> > Do you think there
> > are many cases where a variable can take both an IRI and a literal
> > and the distinction is important?
> 
> No, in the majority of cases it won't be;
> because there are few properties that could either take a URI or
> string.
> rdf:object is actually one of the only ones.
> 
> But in this case, it is rdf:object I need.
> 
> > I kind of have troubles to find an example where that would matter.
> 
> In the LDF use case it does, hence my mail ;-)

:-)


> I understand that a spec cannot be tailored to individual needs,
> but LDF could be a big and compelling use case for Hydra.

Definitely


> What I would propose is something like:
> 
>    _:object hydra:variable "object";
>        hydra:property rdf:object;
>        hydra:serialization hydra:NodeSerialization.
> 
> Where hydra:NodeSerialization is a way that distinguishes
> between IRIs, literals, blank nodes, and variables.
> 
> The default ("convention over configuration") could be
> hydra:TextualSerialization,
> where the IRIs or literals as-is value is passed; losing the ability to
> distinguish.

I had something similar in mind. I was thinking of something like
"ValueOnly" which would correspond to your "TextualSerialization" (IRI
as-is, only lexical form of literals) and "FullRepresentation" (with a
better name) which would correspond to NodeSerialization.


> Summarized: simple cases stay simple,
> complex cases are supported and still simple.

Yeah, I quite like this.


> >> 2) What do the subject, predicate, and object properties really
> >> mean?
> >
> > My take on this would be to either specialize the IriTemplate class
> > to something like a LdfIriTemplate or to specialize hydra:search..
> > something like ldf:queryInterface.
> 
> That's an interesting option and I like it.
> However, I wonder whether Hydra itself could also have
> "collection search semantics" built in;
> so a specialization of hydra:search that says
> "and I will now return those element of the collection
>  that directly have the specified property values".

Yeah, this is directly related to what we discussed some time ago (the
actor/blockbuster thing). I wouldn't be opposed to add something like
hydra:filter for this. However, before doing so I'd still like to evaluate
if property paths wouldn't be the more powerful alternative at the cost of
increasing the complexity a bit.


> A discussion in this direction is here:
> http://lists.w3.org/Archives/Public/public-hydra/2014Feb/0153.html

Great.. so we are on the same page :-) As you see, I typically reply to
mails as I read them.. E-Mail Stream ProcessingT :-P


> I think such a use case would be common enough
> to justify its inclusion in Hydra.

Yep.. even though I would say in most cases you can query/filter only by
object on some properties. So, for example just by last name or whether an
issue is closed/open and not on all "fields" as LDF currently does. Is there
already a way to describe that in LDF?


> > You could then even go as far as saying
> >
> >  ldf:queryInterface a hydra:TemplatedLink ;
> >    supportedOperation [
> >      a ldf:RetrieveBasicLdfOperation ;
> >      hydra:method "GET"
> >      hydra:returns ldf:BasicLdf
> >    ] .
> >
> > (sorry, haven't looked up LDF vocabulary yet)
> 
> Neither have I :-)
> 
> I would also make it a subclass then of hydra:search;
> or the more specific property in Hydra is we decide to create that one.

Yeah


> > I'm pretty excited about this as I really see a lot of potential. It
> > would be interesting to see if a Hydra ApiDocumentation would provide
> > enough information to dynamically "crawl" the data instead of querying
> > it by SPO. Have you spent any thoughts on that already?
> 
> Oh! No I hadn't. Documentation is a very nice application area indeed.
> Thanks!

Not sure you understood what I meant. I meant a Hydra ApiDocumentation along
with the used vocabularies basically provides a client a(n incomplete) map
of the graph a service is exposing. Could that map be used to dynamically
solve queries?

Taking the demo issue tracker as example. Let's say I want to query for open
issues. With LDF I would query for *, vocab:isOpen, true. The service,
however, may not implement a LDF query interface. So, if you give the client
the entry point, it would have to look up the Hydra ApiDocumentation and the
used vocabularies in order to (try to) fulfill the query. In our case here
it would

1) retrieve entrypoint (/api-demo), -> ok, is of type vocab:EntryPoint
2) ApiDocumentation -> vocab:EntryPoint -> supportedProperty -> vocab:issue
(ignoring for the moment that the range is just hydra:Collection and not
doesn't specify that members will be of type vocab:Issue)
3) dereference vocab:issue link (/api-demo/issues) -> hydra:Collection with
vocab:Issue members
4) dereference each vocab:Issue (e.g. /api-demo/issues/5) as the
vocab:isOpen is not included in collection representation
5) filter retrieved issues and only return the ones marked as open

Obviously, it would be much more efficient if the service would offer a
direct interface, but service providers can't always anticipate what
consumers are interested in.



--
Markus Lanthaler
@markuslanthaler

Received on Friday, 14 March 2014 14:24:20 UTC