Reasoning over Web Data was: Terminology Question concerning Web Architecture and Linked Data from Chris Bizer on 2007-07-31 (semantic-web@w3.org from July 2007)

From: Chris Bizer <chris@bizer.de>
Date: Tue, 31 Jul 2007 10:33:32 +0200
To: "Hugh Glaser" <hg@ecs.soton.ac.uk>, "Pat Hayes" <phayes@ihmc.us>
Cc: "Tim Berners-Lee" <timbl@w3.org>, <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>
Message-ID: <004301c7d34d$7a984310$c4e84d57@named4gc1asnuj>
Hi Hugh,

>>If you put all this in one triplestore, with the owl:sameAs assertions, 
>>then
>>it will not be possible to distinguish where facts came from, or rather
>>which facts are associated with which others.
>
> Whoa, careful. It will probably will be >>possible<< to distinguish this, 
> in fact. It might be that unwanted consequences are entailed by the 
> combination of the various RDF graphs and the sameAs, but a careful 
> querying process should be able to determine which of the various triples 
> are present and even whether they are linked. One simple way is to query 
> under sub-OWL entailment, for example, which can be little more than a 
> direct syntactic matching process (see SPARQL).

Some practical backup for Pat's argumentation. Within applications like the 
DISCO Semantic Web browser or the Semantic Web Client Library, we use the
Named Graphs data model to represent RDF data that has been retrieved from 
the Web. This allows us to clearly keep track where information came from 
and which facts are associated with each other.

Beside of this, I think Semantic Web clients have to take two other things 
into account before they start reasoning over retrieved data: 
Trustworthiness and vocabulary mappings. Think about what you are doing in 
the offline world when you read some political newspapers: First you will 
try to align the different terminology used by the authors in your head to 
get a consistent model. Afterwards you will decide which articles to trust 
and which to consider untrustworthy. Only after these two steps, you will 
start to reason about the consequences of what you have read.

I think it would be a good idea for Semantic Web clients to do the same. 
Therefore, I think it is a bit naive to throw lots of RDF data from the Web 
straight into a single RDF model and then wonder that reasoning over this 
data leads to unintended consequences.

I also think that it would not be harmful if OWL tutorials and best practice 
guides would state this fact more clearly so that they do not raise wrong 
expectations.

In the light of the current Semantic Web layer cake discussion, I have been 
wondering for years why the trust layer is up that far in the layer cake. It 
is obvious that you will only get junk if you try to reason over data from 
the web before applying some heuristics to determine trustworthiness and 
filter out low quality information. Therefore, I think the trust layer 
should be positioned lower in the cake. Maybe below Unifying Logic? If this 
is the point where things change from representation to reasoning.

Cheers

Chris


--
Chris Bizer
Freie Universität Berlin
+49 30 838 54057
chris@bizer.de
www.bizer.de
----- Original Message ----- 
From: "Pat Hayes" <phayes@ihmc.us>
To: "Hugh Glaser" <hg@ecs.soton.ac.uk>
Cc: "Tim Berners-Lee" <timbl@w3.org>; "Chris Bizer" <chris@bizer.de>; 
<www-tag@w3.org>; <semantic-web@w3.org>; "Linking Open Data" 
<linking-open-data@simile.mit.edu>
Sent: Monday, July 30, 2007 9:49 PM
Subject: Re: Terminology Question concerning Web Architecture and Linked 
Data


>
>>I am trying hard to keep up (I suspect like many), and was hoping someone
>>would address a concern I have; forgive me if I missed it somewhere in the
>>discussion.
>>I have hung this off this message from Tim, which seems the most relevant.
>>And congratulations on the Linked Data Tutorial - a really useful 
>>document.
>>
>>So here we go:
>>
>>On 25/7/07 14:35, "Tim Berners-Lee" <timbl@w3.org> wrote:
>>
>>>
>>>  (Going back to the original question, as it is much simpler than much
>>>  which follows!)
>>>
>>>  On 2007-07 -07, at 08:43, Chris Bizer wrote:
>>>
>>>
>>>>  Question 3: Depending on the answer to question 1, is it correct to
>>>>  use owl:sameAs [6] to state that http://www.w3.org/People/Berners-
>>>>  Lee/card#i and http://dbpedia.org/resource/Tim_Berners-Lee refer to
>>>>  the same thing as it is done in Tim's profile.
>>>
>>>  Yes.
>>>
>>So Tim absolutely right.
>>This is an entirely logical thing to say.
>>These two NIRs (Non-Information Resources) should be considered the same.
>
> (Aside) I wish folk would not say 'two' when there is only one. Two NIRs 
> should never be considered the same: rather, two names may refer to the 
> same, single, NIR.
>
>>But it is important to consider how this statement will be used, and worry
>>whether there may be unexpected consequences.
>>As we now know, the URIs should be resolvable, and so interesting Semantic
>>Web applications will use the URI to get the Description (or whatever we
>>call it), probably going via a 303.
>>So my SW app will get the RDF of them both, and add it to my triplestore,
>>along with all the other linked data.
>>
>>Tim, as often, is a good example.
>>Consider the places Tim works (W3C, MIT, Southampton, I guess).
>>It is likely that each will publish RDF about him, hopefully using an 
>>agreed
>>ontology (one day!).
>>Now comes the rub.
>>If you put all this in one triplestore, with the owl:sameAs assertions, 
>>then
>>it will not be possible to distinguish where facts came from, or rather
>>which facts are associated with which others.
>
> Whoa, careful. It will probably will be >>possible<< to distinguish this, 
> in fact. It might be that unwanted consequences are entailed by the 
> combination of the various RDF graphs and the sameAs, but a careful 
> querying process should be able to determine which of the various triples 
> are present and even whether they are linked. One simple way is to query 
> under sub-OWL entailment, for example, which can be little more than a 
> direct syntactic matching process (see SPARQL).
>
>>Perhaps 3 job titles, 3 telephone numbers and 3 institution addresses will
>>be returned from the appropriate SPARQL queries, and there will be no
>>(legal) way of working out which corresponds to which.
>
> That would be a symptom of poor RDF/OWL usage, though. Assertions in RDF 
> are not supposed to be local-context-sensitive in the way you seem to be 
> assuming. So for example it would be a mistake to simply assert, in the 
> w3c page, that Tim's status WAS Director. It ought to say that a 
> relationship holds between him and the entity he is the Director of, i.e. 
> the W3C; so that this stays true even when it is moved somewhere else on 
> the Web. In fact, I suggest that as a basic, fundamental principle of any 
> 'web logic' is that assertions in it should have the same meaning wherever 
> they occur on the Web (see for example 
> http://www.ihmc.us:16080/users/phayes/IKL/GUIDE/GUIDE.html#LogicForInt)
>
>>So I can infer that the person http://www.w3.org/People/Berners-Lee/card#i
>>is a Professor at MIT, or a Senior Research Scientist at W3C, or Director 
>>at
>>Southampton, none of which we consider true.
>>(Of course, this was the intention of the sameAs assertion.)
>>
>>I suggest that this is a bad state of affairs
>
> It would be, yes, but it should not arise if the RDF is written properly.
>
>>, and applies to any NIR, not
>>just people.
>
> It applies to any R, I or NI. Its really nothing to do with the nature of 
> the thing named.
>
> Pat Hayes
> -- 
> ---------------------------------------------------------------------
> IHMC (850)434 8903 or (650)494 3973   home
> 40 South Alcaniz St. (850)202 4416   office
> Pensacola (850)202 4440   fax
> FL 32502 (850)291 0667    cell
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
Received on Tuesday, 31 July 2007 08:33:53 UTC