Re: ISSUE-45: Introduce hydra:filter (subPropertyOf hydra:search) from Gregg Kellogg on 2014-04-24 (public-hydra@w3.org from April 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 24 Apr 2014 10:47:01 -0700
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: public-hydra@w3.org, Thomas Hoppe <thomas.hoppe@n-fuse.de>
Message-Id: <B5F7573A-CFF3-4D7B-A54D-CC10FEDA9314@greggkellogg.net>
Gregg Kellogg
gregg@greggkellogg.net

On Apr 24, 2014, at 7:31 AM, Markus Lanthaler <markus.lanthaler@gmx.net> wrote:

> On Wednesday, April 23, 2014 7:35 PM, Gregg Kellogg wrote:
>> On Apr 23, 2014, at 7:32 AM, Markus Lanthaler wrote:
>>> What exactly do you mean by aggregations... do you mean things like
>>> calculating sums of subsets of the members of a collection? If so, that's
>>> not addressed at all. The only "sum" you would potentially get is the number
>>> of members of the filtered collection.
>> 
>> Certainly counting should come for free with a filter, which presumably
>> returns a new Collection from which you can get totalItems. I was thinking of a hydra:search, which
> 
> Yep
> 
> 
>> might return something different than a collection of resources all the same type as the class
>> on which the search if performed (or derived from that class). I was thinking of aggregation
>> in the form of returning simple scalar values such as MIN/MAX/SUM/COUNT, but also
>> possibly GROUP results, in which each result might be described with a class defined in
>> hydra:returns. This may be beyond Hydra's specific use case, but you've said that the
>> semantics of hydra:search are intentionally vague.
> 
> This goes much further than searching or filtering collections. It's really
> querying them (or the complete API for that matter). For the moment, yeah,
> it is out of scope I would say as we have to get the other things right
> first. But I do see this as a potential future work item.. maybe for a Hydra
> Query vocabulary. If you want take a stab at it.. we can certainly also
> start working on that sooner. I just don't want us to be distracted to much
> till we get the basics right.

Best leave it to the future, then. There are enough tough knots to chew on right now.

>> The problem with this, however, is that it
>> then becomes useless for describing a contract between the client and the server, so we fall
>> back on traditional API documentation to understand the behavior of search.
> 
> I wouldn't say it's useless. A Google search certainly isn't useless even
> though you won't know what Google does behind the scenes with your query and
> how it ranks the results.

For the free-text use case, sure.

>> A specific use case I had in mind was to be able to return information about a collection
>> made up of heterogeneous members. You might then want to do the equivalent of a
>> SPARQL aggregating query. I discussed this more fully at the end of my email on querying
>> collections [1] which didn't spark any discussion.
> 
> Sorry for not responding to that mail. I just re-read it. Here's the example
> you posted:
> 
>> # InterestCollection resource
>> </giants/interests> a :InterestCollection, hydra:Collection;
>>  hydra:member </giants/interests/gregg>, </giants/interests/markus> .
>> </giants> :interest </giants/interests/gregg>, </giants/interests/markus> .
>> 
>> # Interest resources
>> </giants/interests/gregg> a :UserLikes; :likeKind "fan"; :performer </gregg> .
>> 
>> </giants/interests/markus> a :UserLikes :likeKind "foe"; :performer </markus> .
>> 
>> Now, for the question about how to query such a collection.
>> 
>> So, how might I go about finding just the fans of the Giants?
> 
> With the current design of hydra:filter, you would be able to filter the
> collection by :likeKind and that would return a new collection consisting of
> just fans:
> 
>   </giants/interests> hydra:filter [
>     hydra:template "/giants/interests/{likeKind}" ;
>     hydra:mapping [
>       hydra:variable "likeKind" ;
>       hydra:property :likeKind
>     ]
>   ]
> 
> If you want multiple things, you would need to do multiple requests.

Sure, that works. Although, I'd be more inclined to define the template as "/giants/interests{?likeKind}", but that's immaterial.

>>>> You'll need to explain more about URI template variable binding; It seems
>>>> that schema:name here is somehow used to find "Markus", perhaps in the
>>>> subject of the collection, and that the "name" query element is interpreted
>>>> by the service to be schema:name for members of the collection. It might
>>>> be somewhat confusing  if it has two senses that aren't directly related.
>>> 
>>> OK, I thought the pseudo code above is enough. Anyway, here is how it works:
>>> 
>>> 1) You have a collection with several members.
>>> 2) You define a IRI template and associate it via hydra:filter to the
>>> collection
>>> 3) Each variable in that IRI template is bound to a property (path)
>>> 4) Expanding the template with concrete values results in a queries of the
>>> form
>>>         ?member ?property "value supplied by client"
>>> 5) Derefering the expanded IRI template returns a collection whose members
>>> match the query criteria
>>> 
>>> Is it clearer now? Otherwise I'll post a concrete example.
>> 
>> So, in your example, hydra:template "/collection?name={schema:name}", the
>> {schema:name} portion of the template is interpreted both by the service and the client?
>> When the client makes the template concrete by replacing {schema:name} with Markus, the
>> server knows to bind the "name" query parameter with the schema:name property path and
>> reverse this logic.
> 
> Exactly. 
> 
> 
>> It's not explicit in this snippet, but I presume there is a mapping that binds "name" to
>> schema:name, in which case wouldn't the template be "/collection{?name}" (per Example 15
>> from the spec). The server understands "name" to be bound to "schema:name", because of
>> the mapping, and RFC6570 describes how to construct a query presuming that "name" is
>> bound to a concrete value; if it were bound to "Markus", this would create
>> "/collection?name=Markus". Presumably, unbound variables are just eliminated from the
>> URI Template.
> 
> Correct. Sorry, I was lazy and used incomplete pseudo-code (I think I marked
> it as such). Nevertheless I will try to be more accurate in the future :-)

Just trying to lift the veil of confusion :)

>>> Thomas has also a good point:
>>> 
>>>> On Sunday, April 20, 2014 1:40 PM, Thomas Hoppe wrote:
>>>> I would appreciate the support of filtering as I have mentioned on
>>>> other posts but the proposed approach as far as I have understood it
>>>> has the major disadvantage that I would need to define a filter for
>>>> each property of collection members on which I want to offer
>>>> filtering. This can become quite lengthy.
>>> 
>>> Yeah, that's true. You would need to specify them explicitely.
>> 
>> We should say something about the role of sub-classing with Hydra operations and
>> constraints. If I define a constraint on schema:Event that defines a hydra:TemplatedLink,
> 
> Currently, you can't associate templated links to a type and then have it
> applied on all instances of that type. The templated link needs to be
> associated with the instance. The reasoning behind that decision is that
> each templated link will likely look different (at least the template). If
> you just append query parameters, that assumption doesn't hold though.
> Should we raise an issue for this?

Yes, I think this is a serious shortcoming, and violates the DRY principle. If I have 20 properties upon which I may interact, and a couple of different ways to filter them, not to mention define operations on them, repeating this information in each instance can consume a large percentage of the HTTP payload. This is where some of my attempts at defining templates presuming either the URL of the resource to be affected, or have a way to reference that URL to form a URL based on it. Relative-URL arithmetic may make this difficult unless every resource ends in "/", which may be too big of an assumption.

In any case, other than the example in 5.2, it's not clear to me that an IriTemplate is restricted from being used in the API description.

>> can we infer that this is also a constraint on something like schema:SportsEvent? Looking at
>> it the other way, a schema:SportsEvent is also a schema:Event through RDFS inference, so
>> operations and constraints defined on the such an instance would presumably also be
>> appropriate for such a resource.
> 
> Operations are. I'm not so sure about constraints (supportedProperty) yet...
> 
> 
>>>> I opt for a more generic approach which allows the client to pick
>>>> arbitrary properties and filter for them -- something like this:
>>>> 
>>>> hydra:filter: {
>>>> @type: "IriTemplate",
>>>> template: "?f={property}:{value}",
>>>> mappings: [
>>>>   {
>>>>     @type: "IriTemplateMapping",
>>>>     variable: "property",
>>>>     property: "rdf:Property",
>>>>     required: true
>>>>   },
>>>>   {
>>>>     @type: "IriTemplateMapping",
>>>>     variable: "value",
>>>>     required: true
>>>>   }
>>>> ]
>>>> }
>>> 
>>> Using rdf:Property this way is ambigous as you wouldn't know whether the
>>> server just supports filterting for rdf:Property or all properties.
>>> 
>>> 
>>>> This would also allow for templates like this:
>>>> 
>>>> template: "?{property}={value}"
>>> 
>>> The other problem with this approach ist hat {property} would have to be
>>> expanded to a full URL as otherwise. So you would end up with very long and
>>> ugly URLs.
>> 
>> So?
> 
> Well, yeah. It's not really a "problem" but also certainly not something
> that most people would like.
> 
> 
>>>> Which would allow to describe the diversity of current filtering
>>>> notations found in APIs.
>>> 
>>> I don't know of many APIs that allow completely arbitrary filtering. Most of
>>> them are quite restricted... which makes sense because filtering might be a
>>> quite costly operation especially if there are lots of properties. If you
>>> really want to allow completely arbitrary filtering, it might actually make
>>> more sense to just send a SPARQL query or something similar. I'm not sure.
>>> Thoughts?
>> 
>> I think basic filtering using property paths is a pretty important use case. We might constrain
>> the length of these paths, as not every implementation will be done using a SPARQL back
>> end. But, for my part, I'm fine with limiting filters to property paths defined as specific
>> mappings within an TemplatedLink.
> 
> If we don't allow Thomas' extension, we don't have to limit the length of
> these paths as the server explicitly advertises what it can handle. 

+1

Gregg

> --
> Markus Lanthaler
> @markuslanthaler
>
Received on Thursday, 24 April 2014 17:47:30 UTC