Filtering collections etc. - taking a step back (ISSUE-45)

Let me try to revive this thread by taking a step back. I think in the
recent discussion we moved too far in the direction of describing everything
in a very detailed manner. I would like to kind of try the opposite and see
how far we can get with that (and sprinkle in a few semantic hints for
automated clients).

What we are trying to solve here, is solved with forms on the traditional
web. Users would get presented a form with a couple of fields and some
(hopefully) descriptive text that tells them what they should expect when
they click the submit button. We can achieve that with operations today as
illustrated by the Hydra Console. I'm not claiming the current operation
design is perfect (in fact I think it has a couple of serious shortcomings
at the moment) but just that it quite closely resembles forms on the web.

As part of ISSUE-45 we want to make it possible to filter collections and
write generic clients capable of automatically understanding how to do so.
At the moment we
  - have a collection and
  - a IRI template that, when expanded, can be used to retrieve the desired
filtered collection

What we need is
  - a description of why a client would want to use the IRI template and
  - a description of how a client can use the IRI template

(we likely also want a way to give a client metadata about the collection
and the filtered view but let's ignore that for the time being)

I think it would be possible (and practical) to generalize these building
blocks to address this and many other use cases. I have been thinking of
something along these lines:

1   Collection X
2      can be
3         filtered
4         by expanding the IRI template Y
5         and dereferencing the resulting IRI
6         the available/necessary parameters are ... 
7         the expected response will be a CollectionView

or

1   Blog post X
2      can be
3         published
4         by constructing a payload filling template Y
5         and POSTing it to Z
6         the available/necessary parameters are ...
7         the expected response will be a 201 Created with the URL

So, (3) would be the expected effect of carrying out the operation. We
should have both a human-readable description as well as an identifier to be
used by machines (we could establish a registry or start by leveraging
Schema.org's Action hierarchy).
(4-5) is a templated HTTP request (headers and body) and (6) describes the
parameters of the template. Finally, (7) is a hint of what the client should
expect when the operation is successful (this would allow a client to know
whether it is worth executing the operation).

The trickiest part is definitely (6) -- at least if we want to make it
machine-understandable. Think of something like "max price". For a human
that is immediately clear. A machine would need to know that only collection
members whose price is smaller than max price would be returned. The same
applies to the combination of parameters.

I tinkered with various approaches but haven't found anything particularly
good yet. Obviously, we could just declare this as being out of scope for
Hydra and just give people hooks to describe that in more detail (for
instance by using query languages such as SQL or SPARQL for the example
above). Or we could define a few more specialized operation types (3) that
kind of hardcode the behavior for common use cases (example: a
specialization of "filter" that is defined as "filter the collection by
equality-checking the specified parameters to the corresponding properties
of each collection member; multiple parameters are combined with AND"). I
doubt that should live in the core vocabulary but we can certainly set up a
registry for these so that people can easily reuse such definitions.

Since this email is already long enough, I'll shut up here :-) I would be
very interested in hearing everyone's thoughts and ideas.


Cheers,
Markus


--
Markus Lanthaler
@markuslanthaler

Received on Monday, 11 April 2016 20:14:56 UTC