Re: Pagination (ISSUE-42) from Andrew Hacking on 2015-02-15 (public-hydra@w3.org from February 2015)

From: Andrew Hacking <ahacking@gmail.com>
Date: Sun, 15 Feb 2015 22:37:02 +1000
To: Dietrich Schulten <ds@escalon.de>
Cc: public-hydra@w3.org
Message-ID: <CAMAVcL8XaZMNKR+gg1OXQZX7UtNPdNnYHVcU9xs60YcBDXmnTg@mail.gmail.com>
On Sat, Feb 14, 2015 at 5:55 PM, Dietrich Schulten <ds@escalon.de> wrote:

> Hi Andrew,
>
> Am 13.02.2015 um 02:51 schrieb Andrew Hacking:
> >
> > What I mean by the above is that collections should not have any related
> > resource content in them as their primary content.
>
> See proposal below - would that cut it?
>

Probably not, see below.

>
> > Orthogonal to the above, related resources can be included in a
> > collection payload via a yet to be defined "side loading" mechanism.
> > Such a mechanism could be used by any resource endpoint to return
> > related resources due efficiency/latency and 1+N query explosion
> > considerations. Example, get a collection of people and then get their
> > related address and organisation resources.
> >
>
> To illustrate some problems with the current design which uses
> hydra:Collection as a container: Currently a person with addresses and
> organization resources would look like this, which I fear appears plain
> silly for a JSON user.
>
> {
>   "@context": {
>     "knownBy": {
>       "@reverse": "foaf:knows",
>       "@type": "@id"
>     },
>     "houses": {
>       "@reverse": "ex:address",
>       "@type": "@id"
>     },
>     "hasAsMember": {
>       "@reverse": "ex:organization",
>       "@type": "@id"
>     }
>   },
>   "@id": "/alice",
>   "foaf:name": "Alice",
>   "collection": [
>     {
>     "@id": "/alice/friends",
>     "@type": "Collection",
>       "manages": {
>         "property": "foaf:knows",
>         "subject": "/alice"
>       },
>       "hydra:member": [{
>         "@id": "/bob",
>         "knownBy": "/alice"
>       },
>       {
>         "@id": "/zelda",
>         "knownBy": "/alice"
>       }
>       ]
>     },
>     {work
>     "@id": "/alice/addresses",
>     "@type": "Collection",
>       "manages": {
>         "property": "ex:address",
>         "subject": "/alice"
>       },
>       "hydra:member": [{
>         "@id": "/home-address",
>         "houses": "/alice"
>       }]
>     },
>
>     {
>     "@id": "/alice/organizations",
>     "@type": "Collection",
>       "manages": {
>         "property": "ex:organization",
>         "subject": "/alice"
>        },
>       "hydra:member": [{
>         "@id": "/alice/organizations/amnesty",
>         "hasAsMember": "/alice"
>       }]
>     }
>   ]
> }
>
> The attributes knows, address and organization are all buried inside
> collection objects, and all members must point back to /alice, otherwise
> the model does not assert that alice has values for knows, address and
> organization at all.
>
> The  above does not work very well when you have multiple people in a
nested collection which embed say common organi[sz]atons.  Other json based
apis' approach this problem by having a separate part of the payload
reserved for additional resources aka "side loading" (GET) and "side
saving" (PUT/POST).  This avoid resources being duplicated in the payload.

Thats why I believe a a first class support for auxiliary resources is both
necessary for minimizing 1+N requests for related resources and this same
mechanism can also be leveraged to include resources within a "Page" or
"Range" resource type.  I see "Page" or "Range" as just a subclass of a
"Collection".


> >
> > Without a viable approach to side loading, developers will ignore such
> > things as hydra or work on other less considered but perhaps more
> > practical approaches just to get the job done. That's basically where I
> > am at.
>
> +1 to get a design which is attractive for the non-RDF ReST community,
> while at the same time making sense to an RDF reasoning tool.
>
> >
> > TL;DR
> >
> > * support side loading of related resources in any resource as a first
> > class construct;
> > * "collections" and "pages" are just resources that link to member
> > resources;
> > * all resources including "collection", "page"/" range" may use the side
> > loading primitive to include related member resources for efficiency AND
> > to address 1+N query explosion
> > * if paging is to become a first class concern in hydra then it must
> > address random access to those pages (ie using templates).
>
> What do you have in mind when you say side-loading? Something whichcan
> allows the server (or the client?) to choose whether it wants to embed
> the collection items or require another request to get them if the
> response would be too large?
>
> Well based on other JSON api's, the client typically specifies the set of
attributes and/or related resources to include/exclude.

I see embedding as distinct from including.  I see embedding more as 'this
is a property of the parent', vs being a separately addressable resource
with its own endpoint.

Borrowing from the discussion in thread [1] I think it could be achieved
> by not using hydra:Collection as a container, but as a descriptor.
>
> // server embeds a collection
> {
>   "@id": "/alice",
>   "foaf:name": "Alice",
>   "foaf:knows": [
>     {"@id":"/bob", "foaf:name": "Robert Rumbaugh"},
>     {"@id":"/zelda", "foaf:name": "Zelda Zackney"}
>   ],
>   "collection": [
>     {
>     "@id": "/alice/friends",
>     "@type": "Collection",
>     "manages": {
>       "property": "foaf:knows",
>       "subject": "/alice"
>     },
>     "search" : ... an iritemplate,
>     "operation" : ... supportedOperations on /alice/friends
>   ]
> }
>
> Note that the hydra:Collection type has no members. Rather, the people
> Alice knows can be found at foaf:knows.
>
>
I am unsure, your descriptor concept looks like it is more something that
should be expressed in @context.


> // server points to external resource with offset/limit
> {
>   "@id" : "/alice"
>   // plain link to friends:
>   "foaf:knows" : { "@id": "/alice/friends" },
>   // saying things about the management of /alice/friends:
>   "collection": [
>     {
>     "@id": "/alice/friends",
>     "@type": "Collection",
>     "manages": {
>       "property": "foaf:knows",
>       "subject": "/alice"
>     },
>     "partial": {
>         "@type": "IriTemplate",
>         "template": /alice/friends{?offset,limit}
>         "mapping": [
>           {
>           "@type": "IriTemplateMapping",
>           "variable": "offset",
>           "property": "hydra:offset"
>           },
>           {
>           "@type": "IriTemplateMapping",
>           "variable": "limit",
>           "property": "hydra:limit",
>           }
>         ]
>     }
>   }
> }
>
> I introduced a new property hydra:partial.
> The target resource returned from /alice/friends is a json-ld set of
> foaf:Person, not a hydra:Collection.
>
> [
>   {"@id":"/bob",
>    "@type": "http://xmlns.com/foaf/0.1/Person",
>    "http://xmlns.com/foaf/0.1/name": "Robert Rumbaugh"
>   },
>   {"@id":"/zelda",
>    "@type": "http://xmlns.com/foaf/0.1/Person",
>    "http://xmlns.com/foaf/0.1/name": "Zelda Zackney"
>   }
> ]
>
> Since we cannot embed meta information about the entire collection into
> the above json response (see [4] for the reason), we use Link headers
> with IANA rels next, prev for paging (and for anything else we want to
> link to, such as create-form[2]). Using Link headers is nothing new in
> Hydra, we use it already for the ApiDocumentation. There is also a draft
> for Link-Template headers[3] - with that we could even have the
> offset/limit template as a header.
>
> Inside the collection *items* we could use hyperlinks as usual. The only
> restriction is that link relations which apply to the entire collection
> must be either link headers of the collection response or they must be
> described at the origin of the link (here: on the Alice resource).
>
> The URI which identifies the entire collection is /alice/friends. The
> URI which identifies a partial collection is
> /alice/friends{?offset,limit}. Clients SHOULD use partial if it is
> available.
>
> Comments? Flames?
>
> Best regards,
> Dietrich
>
> [1] http://lists.w3.org/Archives/Public/public-hydra/2015Feb/0052.html
> [2] https://tools.ietf.org/html/rfc6861#section-3.1
> [3] http://tools.ietf.org/html/draft-nottingham-link-template-01
> [4]
>
> https://www.w3.org/community/hydra/wiki/Avoid_that_collections_%22break%22_relationships#Problem_description
> >
> > Regards,
> >
> > Andrew
> >
> > On 13 Feb 2015 08:42, "Markus Lanthaler" <markus.lanthaler@gmx.net
> > <mailto:markus.lanthaler@gmx.net>> wrote:
> >
> >     It seems, there's quite some pushback to separate paginated
> >     collections into collections and pages. My feeling is that even
> >     getting rid of the PagedCollection type seems to be the preferred
> >     approach at the moment. This would mean that there's always a
> >     sequence of collections but if there's just one, there's obviously
> >     no need to link to others.
> >
> >     Let me try to gauge the preferences by making an alternative design
> >     proposal:
> >       - remove the type PagedCollection
> >       - rename itemsPerPage to numberItems
> >       - drop the "Page" suffix from firstPage, nextPage, previousPage,
> >     lastPage
> >
> >     I'm a bit worried about dropping the "Page" suffix, but wanted to
> >     bring it up nevertheless.
> >
> >     With this new design, a paginated collection would look like this:
> >
> >       {
> >         "@id": "http://api.example.com/an-issue/comments?whatever=3",
> >         "@type": "Collection",
> >         "first": "/an-issue/comments",
> >         "previous": "/an-issue/comments?whatever=2",
> >         "next": "/an-issue/comments?whatever=4",
> >         "last": "/an-issue/comments?whatever=498",
> >         "member": [ ...]
> >       }
> >
> >     What we lose by this is the ability to distinguish whether a single
> >     "page" or the complete collection was referenced. It would always be
> >     the complete collection. A client is thus expected to (potentially)
> >     retrieve the entire collection to find the information it was
> >     looking for. AFAICT, this shouldn't be a problem in practice as our
> >     collection design doesn't allow any inferences; all the
> >     relationships have to be defined explicitly.
> >
> >     Ruben, I would be especially interested in hearing your opinion
> >     since you created ISSUE-42 [1] and mentioned that by implementing
> >     the LDF server it occurred to you that the current design is
> >     confusing. Do you see any practical issues with such a design?
> >
> >
> >     --
> >     Markus Lanthaler
> >     @markuslanthaler
> >
> >
> >
>
>
Received on Sunday, 15 February 2015 12:37:34 UTC