RE: Moving forward with hydra:filter (ISSUE-45) from Markus Lanthaler on 2015-11-26 (public-hydra@w3.org from November 2015)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Thu, 26 Nov 2015 14:19:52 +0100
To: "'Hydra'" <public-hydra@w3.org>
Message-ID: <13d601d1284d$22c3dfb0$684b9f10$@gmx.net>
On 19 Nov 2015 at 09:29, Ruben Verborgh wrote:
>> Hi all,
>> 
>> Coming back to the issue of hydra:filter,
>> let's see if we can find something to all agree on.

Sorry for the delay in responding.


>>>>>> This would be my suggested semantics for hydra:filter:
>>>>>> it's the specific case in which we combine conditions with AND.
>>>>>> Other predicates can be implemented differently;
>>>>>> they all can derive from hydra:search.
>>> 
>>> I don't think having different predicates for different logical
operators
>>> scales. Especially not if you consider that the number of parameters
varies.
>>> So we should come up with something else.
>> 
>> So, it would then always be hydra:filter for any boolean conditions,
>> but could the default meaning then be the commonly used AND?

Exactly.


>>>>> How will we be able to express [an empty value]?
>>>> 
>>>> This would require ExplicitRepresentation and "".
>>> 
>>> Why should an ExplicitRepresentation be interpreted differently? Or are
you
>>> saying that *no value* (instead of an empty string) of a property with
>>> ExplicitRepresentation represents a wildcard (match-all) then?
>> 
>> The problem is that, with BasicRepresentation, we cannot explicitly say
"empty string".
>> We can leave a value empty, but does that mean "no value given" or "empty
string"?

OK, great. We are on the same page.


>> Given an IRI template like /{?firstName,lastName},
>> we can serialize this as /?lastName=Hilton
>> if we don't want to provide a first name with BasicRepresentation.
>> However, this is not possible with /?first={firstName}&last={lastName}.
>> Furthermore, it is not compatible with HTML forms,
>> which always serialize all contents, even if fields are empty.
>> Also, in HTML search forms, the convention is that "empty = not
specified",
>> not that "empty = the value should have 0 characters".
>> 
>>>   ?property=&otherProperty=1  vs.  ?property=""&otherProperty=1
>> 
>> Given ExplicitRepresentation, the first would mean
>> that there are no constraints set on the field property,
>> whereas the second means that only collection items
>> with a zero-length property field should match.

Completely agree. We need to ensure that this is being made in the clear
spec.


>> This is the behavior also implemented in the TPF server,
>> an important reason being the compatibility with HTML forms.
>> 
>> For example, this fragment:
>> 
>> http://fragments.dbpedia.org/2015/en?subject=http%3A%2F%2Fdbpedia.org%2Fr
>> esource% 2FParis_Hilton&predicate=&object= gives items where: - the
>> subject is Paris Hilton - the predicate is anything - the object is
>> anything
>> 
>> Since an IRI need to identify the same resource,
>> regardless of whether the representation is HTML, JSON-LD, or Turtle,
>> the JSON-LD and Turtle representations necessarily
>> have the exact same interpretation of parameters.

+1
 

>>> An option to make this more flexible would be to explicitly describe
that.
>>> Maybe something along these lines:
>>> 
>>>   </collection> :filter [
>>>     rdf:type :IriTemplate, :Filter ;
>>>     :filterSpecification [
>>>       rdf:type :AndFilter ;
>>>       :input [ :variable "first" ] ;
>>>       :input [ :variable "last" ] .
>>>     ] ;
>>>     :template "/collection{?first,last}" ;
>>>     :mapping [ :variable "first"; :property schema:givenName ] ;
>>>     :mapping [ :variable "last"; :property schema:familyName ] .
>>>   ] .
>> 
>> Looks good to me.
>> We might not strictly need filterSpecification as a separate entity,
>> i.e., we could also attach these properties to :Filter directly.

Sounds good to me even though calling a IriTemplate might be a bit of a
stretch. Maik seems to raise a valid concern though (see his reply below).
Maybe we should try to combine this with our new views concept (the filter
property points to a IRI template that tells a client how to get a
FilteredView)!?


>> However, in the case of nested filters,
>> it might be nice to only have 1 "top" :Filter that is also an
:IriTemplate,
>> whereas :FilterSpecifications would then never be :IriTemplates.

I'm not sure I follow. Was the :FilterSpecifications supposed to be
filterSpecification (the property)?


>> Could it be possible to have the AND interpretation as default,
>> since this would seem a common case?

I think this is a case where relying on defaults is dangerous. Clients that
implement this now and won't be updated when we introduce other operators
will continue to think it is an AND filter. I would rather say we just
define AND for the time being but make it explicit. Thoughts?

Going back to the previous point and responding to Maik's reply:

On 19 Nov 2015 at 10:27, Maik Riechert wrote:
> I think a separate entity could actually be useful when you think
> further. In my case I'm creating an API which simulatenously offers
> filtering a collection and transforming/mapping the collection items. So
> I would not use :filter myself, but something more generic like:
> 
>    </collection> :api [
>      rdf:type :IriTemplate, :Filter, :Transformer ;
>      :filterSpecification [
>        rdf:type :AndFilter ;
>        :input [ :variable "creationDateStart" ] ;
>        :input [ :variable "creationDateEnd" ] .
>      ] ;
>      :transformSpecification [
>        rdf:type :SpatioTemporalSubsetTransform ;
>        :input [ :variable "subsetBbox" ] .
>      ] ;
>      
>      :template "/collection{?timeStart,timeEnd,subsetBoundingBox}" ;
>      :mapping [ :variable "creationDateStart"; :property
opensearchtime:start ] ;
>      :mapping [ :variable "creationDateEnd"; :property opensearchtime:end
] ;
>      :mapping [ :variable "subsetBbox"; :property geo:bbox ] .
>    ] .
> 
> Imagine in the above fictional example that the collection items are
> GeoJSON layers (=a GeoJSON feature collection resource) and what you
> want are only those layers created within the given time range, and then
> you want to get a transformed version of each layer which is a cut out
> to the given bounding box (it would not be a simple filtering of geojson
> geometries, it would actually transform and split geometries at the
> borders).

So, if I understand this correctly it would be some sort of a cropping
function. It would just change the representation that is returned but not
the actual state of the resource, right? Does the transformation occur
before or after the filtering? I think we are risking to open a can of worms
here. Couldn't such transformations be easily performed client-side? That
being said...


> Collapsing the two *Specification's into the root would become
> ambiguous, but maybe there's another way to model such thing as well.
> Any idea? Or is the above good as it is?

... it is good that you raised this point as it illustrates that filtering
might not be the only thing a IRI template can do.


> By the way, I'm still not sure how to model the properties correctly. In
> the example above, you could easily add a filter by bounding box as well
> ("give me all GeoJSON layers that contain stuff in the given bounding
> box"), making it:
> 
>      :mapping [ :variable "bbox"; :property geo:bbox ] ;
>      :mapping [ :variable "subsetBbox"; :property geo:bbox ] .
> Now the client would see two variables with the geo:bbox property and
> would at first be confused. Is that what the above :*Specifications are
> for? To give further information about what the variables mean?

At least for the filtering part it would be clear, no? If there's a property
p in the filter and its value is set to v than all resources r which are
part of the collection for which the following holds true are returned (in
more or less SPARQL):

SELECT ?r 
WHERE {
   ?r p v .
}


>>> While this is quite
>>> straightforward I think, it puts quite a burden on the client. Assume
that
>>> the client wants to filter a collection in a specific way. It would need
to
>>> be able to check whether his query can be accepted directly by the
server or
>>> whether it needs to rewrite the query into simpler subqueries or
generalize
>>> the query and complement it with some additional client side querying.
>> 
>> That's true. However:
>> - We don't need to specify very specific filters in Hydra Core;
>>    just knowing that such extensions are possible, is good for :filter.
>>    - We can specify multiple filter mechanisms; a precise but complex
>>    one as the above, and a simpler one. These might or might not be
>>    defined in different specifications; and even if we specify only one,
>>    others can be added later.

That's true but I'm sure this will come up sooner or later and we need to be
able to give an answer then. It doesn't need to block the current decision
though. We just to make sure that the design is extensible enough.


>> In other words: the important thing is the extensibility of :filter,
>> which I think we have.
>> So if this discussion is about :filter itself, I think it is a good
candidate.

+1. I think we are getting close to a solution and consensus.

To summarize, the things that still need to be discussed/decided are
  - do we need to make the filter function explicit or can we safely default
to AND
  - should we focus on describing Filters or FilteredViews (or similar)
  - if it's the former, can a IriTemplate be a Filter itself at the same
time
  - if it's the latter, how do we connect the filter, the FilteredView and
the IriTemplate
  - assuming that the same property needs to be used multiple times in a
IriTemplate, how we model that
  - assuming we need "transformations" or also just simply the equivalent of
the SELECT part of a SQL/SPARQL query how do we possibly model that

We don't need answers to all of these questions to move on but I'd like to
get at least a basic understanding of them and the community group's
sentiment about them.


--
Markus Lanthaler
@markuslanthaler
Received on Thursday, 26 November 2015 13:20:31 UTC