Re: Terminology Question concerning Web Architecture and Linked Data

I am trying hard to keep up (I suspect like many), and was hoping someone
would address a concern I have; forgive me if I missed it somewhere in the
discussion.
I have hung this off this message from Tim, which seems the most relevant.
And congratulations on the Linked Data Tutorial - a really useful document.

So here we go:

On 25/7/07 14:35, "Tim Berners-Lee" <timbl@w3.org> wrote:

> 
> (Going back to the original question, as it is much simpler than much
> which follows!)
> 
> On 2007-07 -07, at 08:43, Chris Bizer wrote:
> 
> 
>> Question 3: Depending on the answer to question 1, is it correct to
>> use owl:sameAs [6] to state that http://www.w3.org/People/Berners-
>> Lee/card#i and http://dbpedia.org/resource/Tim_Berners-Lee refer to
>> the same thing as it is done in Tim's profile.
> 
> Yes.
> 
So Tim absolutely right.
This is an entirely logical thing to say.
These two NIRs (Non-Information Resources) should be considered the same.
But it is important to consider how this statement will be used, and worry
whether there may be unexpected consequences.
As we now know, the URIs should be resolvable, and so interesting Semantic
Web applications will use the URI to get the Description (or whatever we
call it), probably going via a 303.
So my SW app will get the RDF of them both, and add it to my triplestore,
along with all the other linked data.

Tim, as often, is a good example.
Consider the places Tim works (W3C, MIT, Southampton, I guess).
It is likely that each will publish RDF about him, hopefully using an agreed
ontology (one day!).
Now comes the rub.
If you put all this in one triplestore, with the owl:sameAs assertions, then
it will not be possible to distinguish where facts came from, or rather
which facts are associated with which others.
Perhaps 3 job titles, 3 telephone numbers and 3 institution addresses will
be returned from the appropriate SPARQL queries, and there will be no
(legal) way of working out which corresponds to which.
So I can infer that the person http://www.w3.org/People/Berners-Lee/card#i
is a Professor at MIT, or a Senior Research Scientist at W3C, or Director at
Southampton, none of which we consider true.
(Of course, this was the intention of the sameAs assertion.)

I suggest that this is a bad state of affairs, and applies to any NIR, not
just people.
Two solutions come to mind:
1) Introduce a level of indirection. If each of the URIs is only connected
to one further, distinguished, node, or only the appropriate nodes, then
correspondences can be identified.
This means that the ontologies have to be much more carefully constructed
than they appear to be at present, taking cognisance of the consequences of
others making such sameAs statements, in our open world.
2) Decide that owl:sameAs is just too strong for what we mean. Therefore
(one or more) other statements need to be available. Note that seeAlso is
too weak. Others have suggested the need for another predicate in this
(bifurcating) thread. It is the approach we took for CS AKTiveSpace, and
seems to be standing us in good stead since.

To summarise, we need to think carefully about how URI aliasing and, in
particular, coreference between URIs is managed. Although using sameAs may
seem like the right thing to do now, it may have unintended consequences
when the use of inference becomes more widespread. An approach that we have
used at Southampton on the ReSIST project to manage URIs is given here
http://eprints.ecs.soton.ac.uk/14361/

Hugh Glaser
(and Afraz Jaffri)

-- 
Hugh Glaser,  Reader
              Dependable Systems & Software Engineering
              School of Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 (0)23 8059 3670, Fax: +44 (0)23 8059 3045
Mobile: +44 (0)78 9422 3822, Home: +44 (0)23 8061 5652
http://www.ecs.soton.ac.uk/~hg/

Received on Monday, 30 July 2007 19:24:54 UTC