RE: Pagination (ISSUE-42) from Markus Lanthaler on 2015-02-23 (public-hydra@w3.org from February 2015)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Mon, 23 Feb 2015 22:33:18 +0100
To: <public-hydra@w3.org>
Message-ID: <33ed01d04fb0$5ba91370$12fb3a50$@gmx.net>
On 23 Feb 2015 at 10:22, Dietrich Schulten wrote:
> Hi,
> 
> Am 22.02.2015 um 20:43 schrieb Markus Lanthaler:
>> On 16 Feb 2015 at 10:47, Dietrich Schulten wrote:
>>> I want to re-phrase my proposal because my previous attempt appears to
>>> cause misunderstandings.
>>> 
>>> This is not only a reply to Andrew, I'd like to ask everybody to
>>> consider my proposal.
>> 
>> Thanks for bringing this discussion back to the topic and making a concrete
>> proposal.
>> 
>> 
>> 
>>> My proposal is:
>>> 
>>> 1. Use hydra:Collection not as a container but as a descriptor, i.e.
>>> keep the actual items outside of the hydra:Collection object and make
>>> them direct values of the property they belong to. Drop hydra:member.
>> 
>> So you have a collection without members?
> 
> Indeed, since it would be a descriptor and not a container, the
> collection has no nested json attribute with the members as value.

So what would that descriptor describe? Where to find some more triples of a certain shape?


> Dropping member would be the right thing to do if everybody agrees that
> it is ok to have the members outside the hydra:Collection, because
> otherwise we would have two places where clients must look for the
> collection members.

You'd nevertheless have to places to look at, right? The only difference would be that it is not called a collection anymore but a descriptor:

      Resource -- collection --> Collection (manages a certain property)
                                  -> member and (!) direct relationships
               --- property ---> direct relationships

   vs.

      Resource -- descriptor --> Descriptor (manages a certain property)
                                  -> no info about desc. but direct. rel.
               --- property ---> direct relationships

So, the only difference is that in the latter, you wouldn't get back any information about the descriptor even though you followed a link to it... but you would find the triples you were actually looking for there. To find even more triples you would need to switch strategy and start looking at HTTP headers instead.


>>> 5. In order to say things about the collection directly in the context
>>> of the collection response, we must use something else but the
>>> collection response body, because the body contains just items and there
>>> is nothing to attach additional properties to the collection.
>> 
>> Not sure I follow. In the body is what you put there. So why is there "nothing to attach
> additional properties"?
> 
> The proposal is that this should be the response body when the client
> dereferences /alice/friends:
> 
> [
>  {"@id":"/bob",
>   "@type": "http://xmlns.com/foaf/0.1/Person",
>   "http://xmlns.com/foaf/0.1/name": "Robert Rumbaugh"
>  },
>  {"@id":"/zelda",
>   "@type": "http://xmlns.com/foaf/0.1/Person",
>   "http://xmlns.com/foaf/0.1/name": "Zelda Zackney"
>  }
> ]
> 
> In the json array above there is no way to attach additional properties
> to the array (such as paging information).

And why can't you add an entry to that array that looks somewhat like this:

  {
    "@id":"/alice/friends",
    "@type": "hydra:XYZ",
    "nextPage": "/alice/friends/2"
  }


> Also in RDF these are just two persons. There is no subject which
> represents the collection as an entity in its own right, so I could make
> assertions about it:
> 
> <http://json-ld.org/bob> <foaf:name> "Robert Rumbaugh" .
> <http://json-ld.org/zelda> <foaf:name> "Zelda Zackney" .

Right, but nothing prevents you to add more data.. even data about other resources.


>>> Example responses illustrating my proposal: 
>>>> 
>>>>     // server embeds a collection of people Alice knows
>>>>     {
>>>>       "@id": "/alice",
>>>>       "foaf:name": "Alice",
>>>>       "foaf:knows": [
>>>>         {"@id":"/bob", "foaf:name": "Robert Rumbaugh"},
>>>>         {"@id":"/zelda", "foaf:name": "Zelda Zackney"}
>>>>       ],
>>>>       "collection": [
>>>>         {
>>>>         "@id": "/alice/friends",
>>>>         "@type": "Collection",
>>>>         "manages": {
>>>>           "property": "foaf:knows",
>>>>           "subject": "/alice"
>>>>         },
>>>>         "search" : ... an iritemplate,
>>>>         "operation" : ... supportedOperations on /alice/friends
>>>>       ]
>>>>     }
>> 
>> Apart from the missing hydra:member relationship this is exactly what we currently have.
> 
> Yes, you already told me that this is possible. Actually that made me
> think of the proposal in the first place :)
> 
> It has the advantage that the :knows relationship is where non-hydra
> clients expect them.

Right. But the whole point of pagination is that the number of those assertions is too big to be included directly. If a client would find these two assertions there, why should it go and look somewhere else at all?


> Also a reasoner would see the :knows relationship,
> no need to re-state it for every hydra:member.

You state it explicitly there. What you omit (and lose) is the information that those resources are members of a collection. We can of course argue whether that matters... or whether collection is the right term then.


>>>>     // server points to external resource with offset/limit
>>>>     {
>>>>       "@id" : "/alice"
>>>>       // plain link to friends:
>>>>       "foaf:knows" : { "@id": "/alice/friends" },
>>>>       // saying things about the management of /alice/friends:
>> 
>> So /alice/friends is intentionally a foaf:Person and a hydra:Collection at the same time? I
> say intentionally as you explicitly mentioned
> 
> Yes, that is what I try to pull off. The thing at /alice/friends is a
> foaf:Person, the reasoner may safely infer that. See below why I hope
> that may be legitimate.

Well, no. The only reason why we introduced all that complexity of hydra:collection, manages etc. is because we don't want that to be inferred.


> And as I learned that anyone can say anything about any topic, I also
> state that the same thing at /alice/friends is a resource to which I can
> POST, which I can search through and for which I may retrieve partial views.

Uhh... I fear we are quickly heading down the httpRange-14 rabbit hole :-(



>>> 3. Let the list of items be a plain list without surrounding container
>>> because such a container around the items causes problems in the RDF
>>> model. ("/alice foaf:knows hydra:Collection" makes RDF tools think that
>>> the hydra:Collection is a foaf:Person because foaf:knows defines that
>>> its values are foaf:Person)
>> 
>> 
>>>>       "collection": [
>>>>         {
>>>>         "@id": "/alice/friends",
>>>>         "@type": "Collection",
>>>>         "manages": {
>>>>           "property": "foaf:knows",
>>>>           "subject": "/alice"
>>>>         },
>>>>         "partial": {
>>>>             "@type": "IriTemplate",
>>>>             "template": /alice/friends{?offset,limit}
>>>>             "mapping": [
>>>>               {
>>>>               "@type": "IriTemplateMapping",
>>>>               "variable": "offset",
>>>>               "property": "hydra:offset"
>> 
>> How is hydra:offset defined?
> 
> similar to hydra:freetextQuery:
> 
> hydra:offset
> 
> A property representing an offset into an array for use in an
> IriTemplate which allows ranged access to a collection. Usually used in
> combination with hydra:limit.
> Range: xsd:integer

OK. So how would you use such an offset with something like the Facebook newsfeed or your Twitter feed? My point is that you would never be able to traverse those feeds as an offset is basically meaningless in such apps as it would constantly change under your feet. Imagine you retrieved the first ten items. While you did so, ten more items got added to the feed. In your next step you want to get the next ten items and set the offset to 10. What would you get back? Exactly the ten items you already had.


>>>>               },
>>>>               {
>>>>               "@type": "IriTemplateMapping",
>>>>               "variable": "limit",
>>>>               "property": "hydra:limit",
>> 
>> How is hydra:limit defined?
> 
> hydra:limit
> 
> A property representing a limit to restrict the number of returned items
> when retrieving a collection. Usually used in combination with hydra:offset.
> Range: xsd:integer

That's less problematic but most server probably would prefer to not allow arbitrary values there to be able to cache results etc.


>>>>               }
>>>>             ]
>>>>         }
>>>>       }
>>>>     }
>>>>     
>>>>     The target resource returned from /alice/friends is a json-ld set of
>>>>     foaf:Person, not a hydra:Collection.
>> 
>> What's the advantage of that?
> 
> It makes the user of the reasoner happy, because the reasoner will not
> infer that there is a foaf:Person which is not a :Person at all, but a
> hydra:Collection :)

It would. In fact, it would make the client believe that /alice knows more persons than she actually does: /alice/friends (of which the client has no further information), /bob, and /zelda. 



>>>    The response has the following link header to point to the
>>>    next page.
>>>    
>>>    Header:
>>>    Link: <http://example.com/alice/friends?page=2>; rel="next"
>>>    Body:
>>>>     [
>>>>       {"@id":"/bob",
>>>>        "@type": "http://xmlns.com/foaf/0.1/Person",
>>>>        "http://xmlns.com/foaf/0.1/name": "Robert Rumbaugh"
>>>>       },
>>>>       {"@id":"/zelda",
>>>>        "@type": "http://xmlns.com/foaf/0.1/Person",
>>>>        "http://xmlns.com/foaf/0.1/name": "Zelda Zackney"
>>>>       }
>>>>     ]
>> 
>> How would a client know that /alice foaf:knows /zelda ? Should it infer
>> that? Based on what information?
> 
> Yep, that is the big open question with my proposal.
> 
> Technically, what has to happen is the following:
> 
> - Client learns that alice knows whoever is at the end of /alice/friends
> - If it wants to find out more, it must dereference
> - When it dereferences, it finds that multiple persons come back
> - it must make the connection that alice knows every single person that
> came back

That's quite a stretch if you ask me. What if something breaks on the server and it gives you, among other things, the contact details of the administrator back. Would the client infer that /alice knows him?


> In principle, that is not unlike the processing that happens when :knows
> does not have an object as value, but a json array. The value is an
> array in json, but multiple subjects of type :Person in RDF.

That's quite different IMO. In one case you make explicit statements with no room for misinterpretation. In the other case you heavily rely on implicit assumptions and heuristics that could break in multiple ways.


> I tried to find the algorithm for properties having an array as value in
> the JSON-LD api [1]. First the json-ld must be flattened [2], then
> deserialized to RDF[3] and then I think in [3] 4.3.2.5 the magic happens:
> 
> "For each item in values: [...] 4.3.2.5.1 Append a triple composed of
> subject, property, and the result of using the Object to RDF Conversion
> algorithm passing item to triples, unless the result is null, indicating
> a relative IRI that has to be ignored."
> 
> I also found in the api [1] that dereferencing only happens for @context
> or as a starting point to read a json-ld document,

Right, a JSON-LD processor does only very basic operations. All it requires is the context(s) and, obviously, the document itself.


> but not when following a link.

Why should a JSON-LD processor follow a link in the first place?


> Would it be correct to say that json-ld as of itself has no notion of
> connected documents?

No. JSON-LD is all about interlinked data. It is just that a JSON-LD processor isn't concerned about this. It just knows how to transform JSON-LD documents and how to turn them to abstract RDF quads (and vice versa).


> If that is the case, then it is also out of scope for the json-ld api
> spec that
> 
> /alice foaf:knows /bob
> 
> implies that /bob can be dereferenced to learn more about /bob.

Well... that's one of the most fundamental principles of Linked Data. So it is not out of scope.


> But somewhere the latter is certainly specified. From the top of your
> head, do you know where? 

See:
   
   http://www.w3.org/TR/json-ld/#introduction
   http://www.w3.org/TR/json-ld/#data-model


> Maybe in the RDF specifications? Right now I
> still hope I can find proof that pointing to an external collection of
> triples from the object of an origin triple is not forbidden. Maybe it
> is even defined.

It is not forbidden, but you need to do it right otherwise consumers might misinterpret the data.


> But if it is not forbidden, then maybe we can do it. We would need an

Well, we do exactly that with the current collection design. We define a mechanism to unambiguously point a client to more related data.


> additional processing rule like the json-ld api for hydra clients which
> may be applied if the content-type says that the document is a
> hydra-flavored json-ld:
> 
> application/ld+json; profile="http://www.w3.org/ns/hydra#hydra"
> 
> I know, then we are not only a vocabulary anymore, but also a
> media-type. But that would not be bad in itself.
> 
> I hope all this back and forth in the end leads to something everybody
> is satisfied with :)

I think we are starting to go in circles. I'll start a new thread in a few minutes.


Cheers,
Markus


> [1] http://www.w3.org/TR/json-ld-api/
> [2] http://www.w3.org/TR/json-ld-api/#deserialize-json-ld-to-rdf-algorithm
> [3] http://www.w3.org/TR/json-ld-api/#node-map-generation



--
Markus Lanthaler
@markuslanthaler
Received on Monday, 23 February 2015 21:34:09 UTC