Re: Linking non-open data from Chris Bizer on 2008-04-18 (public-lod@w3.org from April 2008)

From: Chris Bizer <chris@bizer.de>
Date: Fri, 18 Apr 2008 15:47:48 +0200
To: "Peter Coetzee" <peter@coetzee.org>
Cc: "Matthias Samwald" <samwald@gmx.at>, <public-lod@w3.org>, "Tassilo Pellegrini" <t.pellegrini@semantic-web.at>, "Andreas Blumauer $Semantic Web Company$" <a.blumauer@semantic-web.at>
Message-ID: <003301c8a15a$cef23ed0$c4e84d57@named4gc1asnuj>
Hi Peter,

reading your "ramblings", they actually make a lot of sence to me and I 
think I even like them better than my own initial ideas on the problem as 
your approach nicely avoids the owl:sameAs.

Anybody else further ideas?

Cheers

Chris


--
Chris Bizer
Freie Universität Berlin
+49 30 838 54057
chris@bizer.de
www.bizer.de
----- Original Message ----- 
From: Peter Coetzee
To: Chris Bizer
Cc: Matthias Samwald ; public-lod@w3.org ; Tassilo Pellegrini ; Andreas 
Blumauer (Semantic Web Company)
Sent: Friday, April 18, 2008 11:06 AM
Subject: Re: Linking non-open data


Hi Chris,

I like the sound of this, as a neat and elegant way to work round the 
problem. The only concern I'd have, is that it lacks any "backwards" links 
from the protected to the public data object. For example, if my agent finds 
the triple


http://mydomain//resource/myResource foaf:interest
http://yourDomain/resource/ProtectedDataAboutObjectX

in some document out there, and *doesn't* have (and cannot get) the 
credentials to access http://yourDomain/resource/ProtectedDataAboutObjectX, 
it has no way of knowing that it might be able to get some data about the 
"real" object (please excuse my loose language!) being discussed from 
http://yourDomain/resource/PublicDataAboutObjectX, or does it? Note, I'm 
assuming here that my agent hasn't encountered the owl:sameAs elsewhere on 
its 'travels'.

I guess there are two obvious solutions to me; either every time we refer to 
ProtectedDataAboutObjectX, we must also include the owl:sameAs to 
PublicDataAboutObjectX, or we must always refer to PublicDataAboutObjectX 
and rely on its linked-ness into ProtectedDataAboutObjectX to get at that 
data if we have credentials. Hmmm - both feel a little bit cumbersome to me, 
what do you think?

On a slightly separate (and less tangible) note, I feel slightly 
uncomfortable with the notion of "refer to that URI about ObjectX because I 
know what data it will serve" - in theory (when the whole world is 
passionate about interlinking their datasets ;) ), shouldn't it be ok to 
refer to any URI for the object, and (perhaps eventually) get to whichever 
data you seek? I recognise that in practise this would be unnecessarily 
inefficient, but stick with me for a minute! As an extension of that 
feeling, it strikes me as odd to mint two different URIs for the same thing, 
solely to get around a mechanical issue like authentication. Perhaps what 
I'm getting at then is something more along the lines of:

1. Use the resource http://yourDomain/resource/ObjectX to refer to the 
resource itself (always)
2. When someone dereferences http://yourDomain/resource/ObjectX, they are 
required to attempt to authenticate
3a. If the client fails to authenticate, they are presented with only the 
public data - perhaps by using a suitable redirect to 
http://yourDomain/resource/PublicDataAboutObjectX (note - no owl:sameAs 
needed, as we're always referring to http://yourDomain/resource/ObjectX)
3b. If the client provides sufficient credentials, they are presented with 
the protected data as well (again, either directly or through a redirect to 
http://yourDomain/resource/ProtectedDataAboutObjectX; whichever is deemed to 
be more "pure")

This mechanism would also permit the server on http://yourDomain/ to serve 
different facets on the data depending on the user who has authenticated 
(e.g. it may be that a "student" user can't see as much data as a 
"supervisor", etc). It also removes (I think?) the risk of agents reaching 
an unnecessary dead-end when they follow a link to 
http://yourDomain/resource/ProtectedDataAboutObjectX.

Apologies for the fairly rambling train of thought - I hope it was vaguely 
coherent!

Any thoughts?

Cheers,
Peter



On Fri, Apr 18, 2008 at 4:21 AM, Chris Bizer <chris@bizer.de> wrote:

Hi Peter,

> One of the problems this presents though, is how to advertise the data 
> that's
> available for a user. Perhaps something like the Semantic Web Sitemap 
> Extension
> [1] could be used / extended to say what data is available behind this 
> authentication,
> so that an agent knows whether or not it's interested in trying to find 
> credentials for it
> (e.g. prompting a user)?

Building on the Sitemap Extension would be one option, but I think 
advertising could also work much simpler just by setting RDF links to the 
access protected resources.

So you could do have something like this:

1. Use http://yourDomain/resource/PublicDataAboutObjectX to identify your 
resource and the public data about it.

2. If some client dereferences this URI it would get the public data 
containing a RDF link like this

http://yourDomain/resource/PublicDataAboutObjectX owl:sameAs 
http://yourDomain/resource/ProtectedDataAboutObjectX

3. If the client would then try to dererference 
http://yourDomain/resource/ProtectedDataAboutObjectX it would be asked to 
provide some credentials.

Using this mechanism, external data providers could also link to the 
protected data, for instance:

http://mydomain//resource/myResource foaf:interest
http://yourDomain/resource/ProtectedDataAboutObjectX

What do you think?

Cheers

Chris


--
Chris Bizer
Freie Universität Berlin
+49 30 838 54057
chris@bizer.de
www.bizer.de
----- Original Message ----- 
From: Peter Coetzee
To: Chris Bizer ; Matthias Samwald
Cc: public-lod@w3.org ; Tassilo Pellegrini ; Andreas Blumauer (Semantic Web 
Company)
Sent: Thursday, April 17, 2008 2:03 PM
Subject: Re: Linking non-open data


Hi all,


On Thu, Apr 17, 2008 at 12:25 PM, Chris Bizer <chris@bizer.de> wrote:


Hi Matthias,




A question that will surely arise in many places when more people get to 
know about the linked data initiative and the growing infrastructure of 
linked open data is: how can these principles be applied to organizational 
data that might not / only partially be open to the public web?



I think applying the Linked Data principles within a corporate intranet does 
not pose any specific requirements. It is just that the data is not 
accessable from the outside.

It sounds to me like deploying linked data over an intranet would be towards 
the "trivial" side of solutions - what about when data is out on (dare I 
say, in? ;) ) the web fully, but you need to control access to it (i.e. the 
authentication Matthias describes). I like the idea of using standard HTTP 
authentication for this - it just seems like the "right" mechanism to use. 
One of the problems this presents though, is how to advertise the data 
that's available for a user. Perhaps something like the Semantic Web Sitemap 
Extension [1] could be used / extended to say what data is available behind 
this authentication, so that an agent knows whether or not it's interested 
in trying to find credentials for it (e.g. prompting a user)?





People will soon try to develop practices for selectively protecting parts 
of their linked data with fine-grained access rights. Could simple HTTP 
authentication be useful for linked data?



As Linked Data heavily relies on HTTP anyway, I think HTTP authentication 
should be the first choise and people having these requirements shoud check 
if they can go with HTTP auth.



How does authentication work for SPARQL endpoints containing several named 
graphs?



Of course you can always make things as difficult as you like. But I guess 
for many use cases an all-or-nothing aproach is good enough, which would 
allow HTTP authentication to be used again.

If you wanted slightly more fine-grained control, I don't see any reason you 
can't still use HTTP auth - if you pass the authenticated user details 
through to whatever framework you're using on the backend to handle SPARQL, 
and then check "does this user have permissions" for each of the named 
graphs mentioned in the query.




Can we use RDF vocabularies to represent access rights? Should such 
vocabularies be standardized?



Sure, but I think all work in this area should be based on clearly motivated 
real-world use cases and collecting these use cases should be the first step 
before starting to define vocabularies.



Is there any ongoing work on defining such practices (or even 'best 
practices')?



There is lots of work on using RDF, OWL and different rules languages to 
represent access control proicies. See for instance Rei, KAoS and Protune or 
the SemWeb policy workshop at 
http://www.l3s.de/~olmedilla/events/2006/SWPW06/ , for older work also 
http://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/

But I guess a lot of this will be a bit over-the-top for the common linked 
data use cases.

Cheers

Chris





Cheers,
Matthias Samwald

I've given some thought to this before, but not put much down on paper yet - 
it occurs to me that this kind of standardisation would be a powerful 
component for a semantic web publishing framework of some description. I 
don't know if Virtuoso is playing in this space at all (other than brief 
sessions toggling with it from the client side, I've not really explored its 
potential yet!), but I've had a project on the back burner for a couple of 
months to try and handle some issues like this (as well as the content 
negotiation, and some other aspects) easily for people wishing to publish 
data in the semantic web. If there's likely to be some interest in such a 
(probably free / oss) project, I can dust it off when I get some time and 
see about bringing it to some kind of completion!

Cheers,
Peter


[1]  -   http://sw.deri.org/2007/07/sitemapextension/
Received on Friday, 18 April 2008 13:48:34 UTC