RE: ISSUE-45: Introduce hydra:filter (subPropertyOf hydra:search)

On Wednesday, April 23, 2014 7:35 PM, Gregg Kellogg wrote:
> On Apr 23, 2014, at 7:32 AM, Markus Lanthaler wrote:
> > What exactly do you mean by aggregations... do you mean things like
> > calculating sums of subsets of the members of a collection? If so,
that's
> > not addressed at all. The only "sum" you would potentially get is the
number
> > of members of the filtered collection.
> 
> Certainly counting should come for free with a filter, which presumably
returns a new
> Collection from which you can get totalItems. I was thinking of a
hydra:search, which

Yep


> might return something different than a collection of resources all the
same type as the class
> on which the search if performed (or derived from that class). I was
thinking of aggregation
> in the form of returning simple scalar values such as MIN/MAX/SUM/COUNT,
but also
> possibly GROUP results, in which each result might be described with a
class defined in
> hydra:returns. This may be beyond Hydra's specific use case, but you've
said that the
> semantics of hydra:search are intentionally vague.

This goes much further than searching or filtering collections. It's really
querying them (or the complete API for that matter). For the moment, yeah,
it is out of scope I would say as we have to get the other things right
first. But I do see this as a potential future work item.. maybe for a Hydra
Query vocabulary. If you want take a stab at it.. we can certainly also
start working on that sooner. I just don't want us to be distracted to much
till we get the basics right.


> The problem with this, however, is that it
> then becomes useless for describing a contract between the client and the
server, so we fall
> back on traditional API documentation to understand the behavior of
search.

I wouldn't say it's useless. A Google search certainly isn't useless even
though you won't know what Google does behind the scenes with your query and
how it ranks the results.


> A specific use case I had in mind was to be able to return information
about a collection
> made up of heterogeneous members. You might then want to do the equivalent
of a
> SPARQL aggregating query. I discussed this more fully at the end of my
email on querying
> collections [1] which didn't spark any discussion.

Sorry for not responding to that mail. I just re-read it. Here's the example
you posted:

> # InterestCollection resource
> </giants/interests> a :InterestCollection, hydra:Collection;
>   hydra:member </giants/interests/gregg>, </giants/interests/markus> .
> </giants> :interest </giants/interests/gregg>, </giants/interests/markus>
.
> 
> # Interest resources
> </giants/interests/gregg> a :UserLikes; :likeKind "fan"; :performer
</gregg> .
> 
> </giants/interests/markus> a :UserLikes :likeKind "foe"; :performer
</markus> .
> 
> Now, for the question about how to query such a collection.
> 
> So, how might I go about finding just the fans of the Giants?

With the current design of hydra:filter, you would be able to filter the
collection by :likeKind and that would return a new collection consisting of
just fans:

   </giants/interests> hydra:filter [
     hydra:template "/giants/interests/{likeKind}" ;
     hydra:mapping [
       hydra:variable "likeKind" ;
       hydra:property :likeKind
     ]
   ]

 If you want multiple things, you would need to do multiple requests.



> >> You'll need to explain more about URI template variable binding; It
seems
> >> that schema:name here is somehow used to find "Markus", perhaps in the
> >> subject of the collection, and that the "name" query element is
interpreted
> >> by the service to be schema:name for members of the collection. It
might
> >> be somewhat confusing  if it has two senses that aren't directly
related.
> >
> > OK, I thought the pseudo code above is enough. Anyway, here is how it
works:
> >
> >  1) You have a collection with several members.
> >  2) You define a IRI template and associate it via hydra:filter to the
> > collection
> >  3) Each variable in that IRI template is bound to a property (path)
> >  4) Expanding the template with concrete values results in a queries of
the
> > form
> >          ?member ?property "value supplied by client"
> >  5) Derefering the expanded IRI template returns a collection whose
members
> > match the query criteria
> >
> > Is it clearer now? Otherwise I'll post a concrete example.
> 
> So, in your example, hydra:template "/collection?name={schema:name}", the
> {schema:name} portion of the template is interpreted both by the service
and the client?
> When the client makes the template concrete by replacing {schema:name}
with Markus, the
> server knows to bind the "name" query parameter with the schema:name
property path and
> reverse this logic.

Exactly. 


> It's not explicit in this snippet, but I presume there is a mapping that
binds "name" to
> schema:name, in which case wouldn't the template be "/collection{?name}"
(per Example 15
> from the spec). The server understands "name" to be bound to
"schema:name", because of
> the mapping, and RFC6570 describes how to construct a query presuming that
"name" is
> bound to a concrete value; if it were bound to "Markus", this would create
> "/collection?name=Markus". Presumably, unbound variables are just
eliminated from the
> URI Template.

Correct. Sorry, I was lazy and used incomplete pseudo-code (I think I marked
it as such). Nevertheless I will try to be more accurate in the future :-)


> > Thomas has also a good point:
> >
> >> On Sunday, April 20, 2014 1:40 PM, Thomas Hoppe wrote:
> >> I would appreciate the support of filtering as I have mentioned on
> >> other posts but the proposed approach as far as I have understood it
> >> has the major disadvantage that I would need to define a filter for
> >> each property of collection members on which I want to offer
> >> filtering. This can become quite lengthy.
> >
> > Yeah, that's true. You would need to specify them explicitely.
> 
> We should say something about the role of sub-classing with Hydra
operations and
> constraints. If I define a constraint on schema:Event that defines a
hydra:TemplatedLink,

Currently, you can't associate templated links to a type and then have it
applied on all instances of that type. The templated link needs to be
associated with the instance. The reasoning behind that decision is that
each templated link will likely look different (at least the template). If
you just append query parameters, that assumption doesn't hold though.
Should we raise an issue for this?


> can we infer that this is also a constraint on something like
schema:SportsEvent? Looking at
> it the other way, a schema:SportsEvent is also a schema:Event through RDFS
inference, so
> operations and constraints defined on the such an instance would
presumably also be
> appropriate for such a resource.

Operations are. I'm not so sure about constraints (supportedProperty) yet...


> >> I opt for a more generic approach which allows the client to pick
> >> arbitrary properties and filter for them -- something like this:
> >>
> >> hydra:filter: {
> >>  @type: "IriTemplate",
> >>  template: "?f={property}:{value}",
> >>  mappings: [
> >>    {
> >>      @type: "IriTemplateMapping",
> >>      variable: "property",
> >>      property: "rdf:Property",
> >>      required: true
> >>    },
> >>    {
> >>      @type: "IriTemplateMapping",
> >>      variable: "value",
> >>      required: true
> >>    }
> >>  ]
> >> }
> >
> > Using rdf:Property this way is ambigous as you wouldn't know whether the
> > server just supports filterting for rdf:Property or all properties.
> >
> >
> >> This would also allow for templates like this:
> >>
> >>  template: "?{property}={value}"
> >
> > The other problem with this approach ist hat {property} would have to be
> > expanded to a full URL as otherwise. So you would end up with very long
and
> > ugly URLs.
> 
> So?

Well, yeah. It's not really a "problem" but also certainly not something
that most people would like.


> >> Which would allow to describe the diversity of current filtering
> >> notations found in APIs.
> >
> > I don't know of many APIs that allow completely arbitrary filtering.
Most of
> > them are quite restricted... which makes sense because filtering might
be a
> > quite costly operation especially if there are lots of properties. If
you
> > really want to allow completely arbitrary filtering, it might actually
make
> > more sense to just send a SPARQL query or something similar. I'm not
sure.
> > Thoughts?
> 
> I think basic filtering using property paths is a pretty important use
case. We might constrain
> the length of these paths, as not every implementation will be done using
a SPARQL back
> end. But, for my part, I'm fine with limiting filters to property paths
defined as specific
> mappings within an TemplatedLink.

If we don't allow Thomas' extension, we don't have to limit the length of
these paths as the server explicitly advertises what it can handle. 


--
Markus Lanthaler
@markuslanthaler

Received on Thursday, 24 April 2014 14:31:45 UTC