Re: Hydra use case: Linked Data Fragments (ISSUE-30) from Gregg Kellogg on 2014-03-14 (public-hydra@w3.org from March 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Fri, 14 Mar 2014 12:02:58 -0700
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: Ruben Verborgh <ruben.verborgh@ugent.be>, public-hydra@w3.org
Message-Id: <F286BE4A-E57F-44F2-888C-BC794E4A6384@greggkellogg.net>
On Mar 14, 2014, at 7:23 AM, Markus Lanthaler <markus.lanthaler@gmx.net> wrote:

> On Friday, March 14, 2014 2:26 PM, Ruben Verborgh wrote:
>> Basic Linked Data Fragments share the URI template of Restpark.
>> I actually had a rather similar experience as you;
>> read about them and forgot until Luca pinged me.
> 
> :-)
> 
> 
>> However, whereas Restpark still has "query" in its terminology
>> (for instance, there is a limit parameter);
>> basic LDFs are really just specific fragment of a dataset
>> that can (and should) be interpreted separately from their application.
>> And that's where Hydra comes in:
>> my client might use fragments to solve SPARQL queries,
>> but other clients might do something completely different.
> 
> 
> +1
> 
> 
>>>>   :dbpedia void:subset <http://data-
>>>> cdn.linkeddatafragments.org/dbpedia?subject=&predicate=dbpedia-
>>>> owl%3AbirthPlace&object=dbpedia%3ANew_York>;
>>>>       hydra:search _:triplePattern.
>>> 
>>> This all makes perfect sense to me.. the only thing that you might wanna
>>> change (not sure) is to what hydra:search is attached to.  In this case
>>> here, I (as a client) would assume that you further query that Linked
> Data
>>> Fragment (instead of querying the whole DBpedia dataset).
>> 
>> The above are two distinct triples, right? So I'm saying that:
>> 
>>>> :dbpedia void:subset <....>.
>>>> :dbpedia hydra:search _:triplePattern.
> 
> Sorry, you are of course right. I've misread those two triples. I read it as
> 
>  <....> hydra:search _:triplePattern
> 
> 
>> So this does capture the semantics that the whole dataset is searched?
>> I.e., would the client know that the query searches DBpedia, not the
>> fragment?
> 
> Yes, my bad.
> 
> 
>>>> 1) How should a parameter be serialized in the URI template?
> [...]
>>> We can either define (and fix) how
>>> IRIs/literals are to be serialized or we add a mechanism to describe
>>> how they should be serialized.
>> 
>> That's it.
>> But. the full flexibility that this use case needs
>> will probably be overkill for many use cases.
>> So I'm afraid there will have to be a mechanism,
>> because few would want to go all-the-way.
>> 
>> As I've shown, for this use case it's crucial to distinguish
>> beween literals and URIs. It's a no-go to do anything else.
>> But it would probably be unreasonable to expect
>> that people will want to indicate this difference all the time.
>> (For instance, always have < > around URIs or "" around strings.)

Not a fan of using <> around URIs. I'd say that the general practice would be that values are either (URI-encoded) URIs, or quoted strings, with optional language using a Turtle-like syntax. I don't see any real advantage of including datatype information, though. Alternatively, language could be another parameter (subject=&predicate=&object=&language=).

>>> Allowing to describe the expected serialization
>>> format is much more flexible but makes the implementation of
>>> (primarily) clients more difficult.
>> 
>> What could work is "convention over configuration".
>> (But still allowing configuration.)
> 
> Yeah.. even though that's always a bit odd with the open world assumption.
> But it is just a hint anyway.. so it should probably be good enough.
> 
> 
>>>> And of course, there would be many more ways to parse parameters.
>>>> I could live with only giving one that works for clients,
>>>> but it should be consistent and allow to differentiate between
>>>> strings and URIs.
>>> 
>>> Would be your preference or can you "just live with it"?
>> 
>> What I mean is that:
>> the server currently supports different ways to pass a URI.
>> You could abbreviate it with prefixes, or have to full URI in < >.
>> It would be totally fine with me if Hydra were only able
>> to explain just one of them, and not both.
> 
> OK, yeah.. I think allowing prefixes makes the solution much complex without
> bringing much advantages if we talk about machine clients. If a human user
> is interacting such a service, the UI can still support CURIEs but expand
> them before sending the URL to the server. Have you considered doing that in
> your prototype? A positive side effect of doing so would be increased
> cache-hit rates.

+1, I don't see a real need to use compact IRIs; however, if you were to, the set of IRI prefixes could come either from @prefix (along with default prefixes for RDFa) or prefixes defined in a JSON-LD context.

>> But it would need to explain one of them.
>> 
>>> Do you think there
>>> are many cases where a variable can take both an IRI and a literal
>>> and the distinction is important?
>> 
>> No, in the majority of cases it won't be;
>> because there are few properties that could either take a URI or
>> string.
>> rdf:object is actually one of the only ones.

For schema.org data, it's always legal to use Text where an entity reference is expected. For practical purposes, I solve this on ingestion, so that, e.g., :gregg schema:knows "Ruben Verborgh" gets expanded to :gregg schema:knows [schema:name "Ruben Verborgh"], but that won't univerally be the case, so an object may take either an IRI or a literal IMO.

>> But in this case, it is rdf:object I need.
>> 
>>> I kind of have troubles to find an example where that would matter.
>> 
>> In the LDF use case it does, hence my mail ;-)
> 
> :-)
> 
> 
>> I understand that a spec cannot be tailored to individual needs,
>> but LDF could be a big and compelling use case for Hydra.
> 
> Definitely
> 
> 
>> What I would propose is something like:
>> 
>>   _:object hydra:variable "object";
>>       hydra:property rdf:object;
>>       hydra:serialization hydra:NodeSerialization.
>> 
>> Where hydra:NodeSerialization is a way that distinguishes
>> between IRIs, literals, blank nodes, and variables.
>> 
>> The default ("convention over configuration") could be
>> hydra:TextualSerialization,
>> where the IRIs or literals as-is value is passed; losing the ability to
>> distinguish.
> 
> I had something similar in mind. I was thinking of something like
> "ValueOnly" which would correspond to your "TextualSerialization" (IRI
> as-is, only lexical form of literals) and "FullRepresentation" (with a
> better name) which would correspond to NodeSerialization.

As I mentioned, in some cases, you might need to be more flexible. In any case, coming up with a simple scheme should make this unambigious: IRIs and BNodes can easily be determined, literals always being quoted. If an IRI starts with a known prefix, then it can be expanded, etc.

Gregg

>> Summarized: simple cases stay simple,
>> complex cases are supported and still simple.
> 
> Yeah, I quite like this.
> 
> 
>>>> 2) What do the subject, predicate, and object properties really
>>>> mean?
>>> 
>>> My take on this would be to either specialize the IriTemplate class
>>> to something like a LdfIriTemplate or to specialize hydra:search..
>>> something like ldf:queryInterface.
>> 
>> That's an interesting option and I like it.
>> However, I wonder whether Hydra itself could also have
>> "collection search semantics" built in;
>> so a specialization of hydra:search that says
>> "and I will now return those element of the collection
>> that directly have the specified property values".
> 
> Yeah, this is directly related to what we discussed some time ago (the
> actor/blockbuster thing). I wouldn't be opposed to add something like
> hydra:filter for this. However, before doing so I'd still like to evaluate
> if property paths wouldn't be the more powerful alternative at the cost of
> increasing the complexity a bit.
> 
> 
>> A discussion in this direction is here:
>> http://lists.w3.org/Archives/Public/public-hydra/2014Feb/0153.html
> 
> Great.. so we are on the same page :-) As you see, I typically reply to
> mails as I read them.. E-Mail Stream ProcessingT :-P
> 
> 
>> I think such a use case would be common enough
>> to justify its inclusion in Hydra.
> 
> Yep.. even though I would say in most cases you can query/filter only by
> object on some properties. So, for example just by last name or whether an
> issue is closed/open and not on all "fields" as LDF currently does. Is there
> already a way to describe that in LDF?
> 
> 
>>> You could then even go as far as saying
>>> 
>>> ldf:queryInterface a hydra:TemplatedLink ;
>>>   supportedOperation [
>>>     a ldf:RetrieveBasicLdfOperation ;
>>>     hydra:method "GET"
>>>     hydra:returns ldf:BasicLdf
>>>   ] .
>>> 
>>> (sorry, haven't looked up LDF vocabulary yet)
>> 
>> Neither have I :-)
>> 
>> I would also make it a subclass then of hydra:search;
>> or the more specific property in Hydra is we decide to create that one.
> 
> Yeah
> 
> 
>>> I'm pretty excited about this as I really see a lot of potential. It
>>> would be interesting to see if a Hydra ApiDocumentation would provide
>>> enough information to dynamically "crawl" the data instead of querying
>>> it by SPO. Have you spent any thoughts on that already?
>> 
>> Oh! No I hadn't. Documentation is a very nice application area indeed.
>> Thanks!
> 
> Not sure you understood what I meant. I meant a Hydra ApiDocumentation along
> with the used vocabularies basically provides a client a(n incomplete) map
> of the graph a service is exposing. Could that map be used to dynamically
> solve queries?
> 
> Taking the demo issue tracker as example. Let's say I want to query for open
> issues. With LDF I would query for *, vocab:isOpen, true. The service,
> however, may not implement a LDF query interface. So, if you give the client
> the entry point, it would have to look up the Hydra ApiDocumentation and the
> used vocabularies in order to (try to) fulfill the query. In our case here
> it would
> 
> 1) retrieve entrypoint (/api-demo), -> ok, is of type vocab:EntryPoint
> 2) ApiDocumentation -> vocab:EntryPoint -> supportedProperty -> vocab:issue
> (ignoring for the moment that the range is just hydra:Collection and not
> doesn't specify that members will be of type vocab:Issue)
> 3) dereference vocab:issue link (/api-demo/issues) -> hydra:Collection with
> vocab:Issue members
> 4) dereference each vocab:Issue (e.g. /api-demo/issues/5) as the
> vocab:isOpen is not included in collection representation
> 5) filter retrieved issues and only return the ones marked as open
> 
> Obviously, it would be much more efficient if the service would offer a
> direct interface, but service providers can't always anticipate what
> consumers are interested in.
> 
> 
> 
> --
> Markus Lanthaler
> @markuslanthaler
> 
>
Received on Friday, 14 March 2014 19:03:33 UTC