Re: How to avoid that collections "break" relationships from Gregg Kellogg on 2014-03-25 (public-vocabs@w3.org from March 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Tue, 25 Mar 2014 10:45:35 -0700
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: Markus Lanthaler <markus.lanthaler@gmx.net>, public-hydra@w3.org, public-lod@w3.org, W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-Id: <3BC73B45-EBEC-4058-838B-F9F2F8A66255@greggkellogg.net>
Hi Peter,

On Mar 25, 2014, at 9:49 AM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:

> Let's see if I have this right.
> 
> You are encountering a situation where thenumber of people Markus knows is too big (somehow).  The proposed solution is to move this information to a separate location. I don't see how this helps in reducing the size of the information, which was the initial problem.

From my perspective, this is really a clash between the notions of the use of URIs in RDF to denote entities, and relative URIs in many REST applications to denote relationships. In my experience, a RESTful web application may use a URI relative to an entity's location as a way to access related entities; this is a common pattern in Ruby on Rails. For example:

http://example/users/1

In many systems, this would be served by a controller where _1_ is taken to be a primary key for a related SQL table, in this case a Users table. If users are joined together using a many-to-many relationship, a convention I can use in my application is to construct a "route", such as the following:

http://example/users/1/knows/

Which might be semantically equivalent (within the application logic) to http://example/knows?user_id=1. The controller may then query the join table where one column (say src_id) is _1_, so that results find related entities based on another column in the join table (say dest_id). The application may then return all records in a single request, or a subset of those records through pagination.

Many developers will want to be able to publish information about their datasets using a vocabulary such as schema.org. Given that an entity may contain many relationships, it is not feasiable to create a single entity description with all of the members of these relationships enumerated. For example, a User entity may have parents, children, friends (knows), likes, comments, photos, ... Moreover, these relationships are bi-directional (a user asserts a knows relationship with another user, and is known by other users). In a prototypical Rails application, this works because a page rendered for a user contains controls to access these relationships. How does the developer of such an application capture these semantics using something like schema.org? As it stands in Hydra now, these relationships might be described as follows:

<.../markus/> a schema:Person;
  schema:knows <.../markus/knows>;
  ...

However, as markus points out, the <../markus/knows> resource likely returns a collection, rather than a person. This isn't a show-stopper for schema.org, because schema:knows does not use rdfs:range, but schema:rangeIncludes, which does not cause an inference that <.../markus/knows> is a schema:Person, but the same logic should work for something such as FOAF, where it would create such a contradiction.

The challenge for a developer is to come up with entity markup that has a good chance of being understood for SEO purposes, and does not create so high a conceptual barrier for the developer that they just don't attempt it. I think it is our responsibility to provide best practices for marking up entities used in such applications in a simple way that does not clash with RDF expectations, where any URI used in the range of schema:knows is expected to be a person and not a collection.

> Splitting this information into pieces might help. schema.org, along with just about every other RDF syntax, doesnot require that all the information about a particular entity is in the same spot. The problem then is to ensure that all the information is accessed together.
> 
> schema.org, somewhat separate from other RDF syntaxes, does have facilities for this.  All you need to do is to set up multiple pages, for example
> .../markus1 through.../markusn
> and on each of these pages include schema.org markup withcontent like
> <.../markusi> schema:url <.../markus>
> <.../markus> schema:knows <.../friendi1>
> ...
> <.../markus> schema:knows <.../friendimi>
> Then on .../markus you have
> <.../markus> schema:url <.../markus1>
> ...
> <.../markus> schema:url <.../markusn>
> (Maybe schema:sameAs is a better relationshipto use here, but they both should work.)
> 
> Voila! (With the big provisio that I have no idea whether the schema.org processors actually dothe right thing here, asthere is no indication of what they do do.)

The problem is, that if this is to drive application logic, as is the intent of Hydra, how to know what URI to dereference if you're interested in schema:knows, or schema:children, or schema:parent, or schema:comment, or whatever the interesting relationship is?

I think there are two ways out of this:

1) schema.org can break the relationship expectation model by specifically allowing, say, an ItemList to be the value of any property with the intent that it provide such an indirection, and damn the RDF consequences.
2) use something like an operation, that describes these relationships, but has less of a chance of being useful for SEO. For example:

<../markus/> a foaf:Person
 hydra:supportedOperation [
   a GetRelatedCollectionOperation;
   hydra:title "Get known relations";
   hydra:description "Retrieves a collection of foaf:Person related to the subject through foaf:knows";
   hydra:property foaf:knows;
   hydra:uri <../markus/knows>;
   hydra:method "GET";
   hydra:returns foaf:Person
 ] .

Gregg

> peter
> 
> PS:  LDP??
> 
> On 03/24/2014 08:24 AM, Markus Lanthaler wrote:
>> Hi all,
>> 
>> We have an interesting discussion in the Hydra W3C Community Group [1]
>> regarding collections and would like to hear more opinions and ideas. I'm
>> sure this is an issue a lot of Linked Data applications face in practice.
>> 
>> Let's assume we want to build a Web API that exposes information about
>> persons and their friends. Using schema.org, your data would look somewhat
>> like this:
>> 
>>   </markus> a schema:Person ;
>>             schema:knows </alice> ;
>>             ...
>>             schema:knows </zorro> .
>> 
>> All this information would be available in the document at /markus (please
>> let's not talk about hash URLs etc. here, ok?). Depending on the number of
>> friends, the document however may grow too large. Web APIs typically solve
>> that by introducing an intermediary (paged) resource such as
>> /markus/friends/. In Schema.org we have ItemList to do so:
>> 
>>   </markus> a schema:Person ;
>>             schema:knows </markus/friends/> .
>> 
>>   </markus/friends/> a schema:ItemList ;
>>             schema:itemListElement </alice> ;
>>             ...
>>             schema: itemListElement </zorro> .
>> 
>> This works, but has two problems:
>>   1) it breaks the /markus --[knows]--> /alice relationship
>>   2) it says that /markus --[knows]--> /markus/friends
>> 
>> While 1) can easily be fixed, 2) is much trickier--especially if we consider
>> cases that don't use schema.org with its "weak semantics" but a vocabulary
>> that uses rdfs:range, such as FOAF. In that case, the statement
>> 
>>   </markus> foaf:knows </markus/friends/> .
>> 
>> and the fact that
>> 
>>   foaf:knows rdfs:range foaf:Person .
>> 
>> would yield to the "wrong" inference that /markus/friends is a foaf:Person.
>> 
>> How do you deal with such cases?
>> 
>> How is schema.org intended to be used in cases like these? Is the above use
>> of ItemList sensible or is this something that should better be avoided?
>> 
>> 
>> Thanks,
>> Markus
>> 
>> 
>> P.S.: I'm aware of how LDP handles this issue, but, while I generally like
>> the approach it takes, I don't like that fact that it imposes a specific
>> interaction model.
>> 
>> 
>> [1] http://bit.ly/HydraCG
>> 
>> 
>> 
>> --
>> Markus Lanthaler
>> @markuslanthaler
>> 
>> 
>> 
>> 
>> 
> 
>
Received on Tuesday, 25 March 2014 17:46:16 UTC