Re: Reasoning over Web Data was: Terminology Question concerning Web Architecture and Linked Data from Hugh Glaser on 2007-07-31 (semantic-web@w3.org from July 2007)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Tue, 31 Jul 2007 20:25:42 +0100
To: Chris Bizer <chris@bizer.de>, Pat Hayes <phayes@ihmc.us>
CC: Tim Berners-Lee <timbl@w3.org>, <semantic-web@w3.org>, Linking Open Data <linking-open-data@simile.mit.edu>
Message-ID: <C2D54B46.DDCC%hg@ecs.soton.ac.uk>
Pat, Chris,
I think we share a view that there are some issues here, at least with
ontology design, that might benefit from wider awareness, perhaps even in
the Linked Data Tutorial.

On 31/7/07 09:33, "Chris Bizer" <chris@bizer.de> wrote:

> Hi Hugh,
> 
>>> If you put all this in one triplestore, with the owl:sameAs assertions,
>>> then
>>> it will not be possible to distinguish where facts came from, or rather
>>> which facts are associated with which others.
>> 
>> Whoa, careful. It will probably will be >>possible<< to distinguish this,
>> in fact. It might be that unwanted consequences are entailed by the
>> combination of the various RDF graphs and the sameAs, but a careful
>> querying process should be able to determine which of the various triples
>> are present and even whether they are linked. One simple way is to query
>> under sub-OWL entailment, for example, which can be little more than a
>> direct syntactic matching process (see SPARQL).
> 
> Some practical backup for Pat's argumentation. Within applications like the
> DISCO Semantic Web browser or the Semantic Web Client Library, we use the
> Named Graphs data model to represent RDF data that has been retrieved from
> the Web. This allows us to clearly keep track where information came from
> and which facts are associated with each other.
Yes, it is possible to distinguish.
This begs the question: if I need to use Named Graphs for the simplest query
about Tim's three roles, effectively bypassing the sameAs inference, was
sameAs the right thing to use?
> 
> Beside of this, I think Semantic Web clients have to take two other things
> into account before they start reasoning over retrieved data:
> Trustworthiness and vocabulary mappings. Think about what you are doing in
> the offline world when you read some political newspapers: First you will
> try to align the different terminology used by the authors in your head to
> get a consistent model. Afterwards you will decide which articles to trust
> and which to consider untrustworthy. Only after these two steps, you will
> start to reason about the consequences of what you have read.
> 
> I think it would be a good idea for Semantic Web clients to do the same.
> Therefore, I think it is a bit naive to throw lots of RDF data from the Web
> straight into a single RDF model and then wonder that reasoning over this
> data leads to unintended consequences.
Trust is a big issue (and especially motivates Named graphs), but I don't
think it illuminates this case.
I am not describing a situation where I am throwing lots of RDF into a
triplestore. The situation is that I want to do some querying, say about
people at W3C. I find Tim's URI, and retrieve the RDF, and his associated
sameAs URIs -> RDF, and put it all into a triplestore cache, so that I can
conveniently do some work on it.
Since it all starts from Tim's page, I don't see there is much of a trust
issue here either.
This is a straightforward bit of SW business.
> 
> I also think that it would not be harmful if OWL tutorials and best practice
> guides would state this fact more clearly so that they do not raise wrong
> expectations.
That would be good.
So what is the recommended best practice?
Either on the querying side, to use Named Graphs model all the time; or on
the representation side, as I said in my original message (which seemed to
get lost off the end of Pat's reply):
> This means that the ontologies have to be much more carefully constructed
> than they appear to be at present, taking cognisance of the consequences of
> others making such sameAs statements, in our open world.

Hugh

> 
> In the light of the current Semantic Web layer cake discussion, I have been
> wondering for years why the trust layer is up that far in the layer cake. It
> is obvious that you will only get junk if you try to reason over data from
> the web before applying some heuristics to determine trustworthiness and
> filter out low quality information. Therefore, I think the trust layer
> should be positioned lower in the cake. Maybe below Unifying Logic? If this
> is the point where things change from representation to reasoning.
> 
> Cheers
> 
> Chris
> 
> 
> --
> Chris Bizer
> Freie Universität Berlin
> +49 30 838 54057
> chris@bizer.de
> www.bizer.de
> ----- Original Message -----
> From: "Pat Hayes" <phayes@ihmc.us>
> To: "Hugh Glaser" <hg@ecs.soton.ac.uk>
> Cc: "Tim Berners-Lee" <timbl@w3.org>; "Chris Bizer" <chris@bizer.de>;
> <www-tag@w3.org>; <semantic-web@w3.org>; "Linking Open Data"
> <linking-open-data@simile.mit.edu>
> Sent: Monday, July 30, 2007 9:49 PM
> Subject: Re: Terminology Question concerning Web Architecture and Linked
> Data
> 
> 
>> 
>>> I am trying hard to keep up (I suspect like many), and was hoping someone
>>> would address a concern I have; forgive me if I missed it somewhere in the
>>> discussion.
>>> I have hung this off this message from Tim, which seems the most relevant.
>>> And congratulations on the Linked Data Tutorial - a really useful
>>> document.
>>> 
>>> So here we go:
>>> 
>>> On 25/7/07 14:35, "Tim Berners-Lee" <timbl@w3.org> wrote:
>>> 
>>>> 
>>>>  (Going back to the original question, as it is much simpler than much
>>>>  which follows!)
>>>> 
>>>>  On 2007-07 -07, at 08:43, Chris Bizer wrote:
>>>> 
>>>> 
>>>>>  Question 3: Depending on the answer to question 1, is it correct to
>>>>>  use owl:sameAs [6] to state that http://www.w3.org/People/Berners-
>>>>>  Lee/card#i and http://dbpedia.org/resource/Tim_Berners-Lee refer to
>>>>>  the same thing as it is done in Tim's profile.
>>>> 
>>>>  Yes.
>>>> 
>>> So Tim absolutely right.
>>> This is an entirely logical thing to say.
>>> These two NIRs (Non-Information Resources) should be considered the same.
>> 
>> (Aside) I wish folk would not say 'two' when there is only one. Two NIRs
>> should never be considered the same: rather, two names may refer to the
>> same, single, NIR.
Thanks.
Sorry.
>> 
>>> But it is important to consider how this statement will be used, and worry
>>> whether there may be unexpected consequences.
>>> As we now know, the URIs should be resolvable, and so interesting Semantic
>>> Web applications will use the URI to get the Description (or whatever we
>>> call it), probably going via a 303.
>>> So my SW app will get the RDF of them both, and add it to my triplestore,
>>> along with all the other linked data.
>>> 
>>> Tim, as often, is a good example.
>>> Consider the places Tim works (W3C, MIT, Southampton, I guess).
>>> It is likely that each will publish RDF about him, hopefully using an
>>> agreed
>>> ontology (one day!).
>>> Now comes the rub.
>>> If you put all this in one triplestore, with the owl:sameAs assertions,
>>> then
>>> it will not be possible to distinguish where facts came from, or rather
>>> which facts are associated with which others.
>> 
>> Whoa, careful. It will probably will be >>possible<< to distinguish this,
>> in fact. It might be that unwanted consequences are entailed by the
>> combination of the various RDF graphs and the sameAs, but a careful
>> querying process should be able to determine which of the various triples
>> are present and even whether they are linked. One simple way is to query
>> under sub-OWL entailment, for example, which can be little more than a
>> direct syntactic matching process (see SPARQL).
>> 
>>> Perhaps 3 job titles, 3 telephone numbers and 3 institution addresses will
>>> be returned from the appropriate SPARQL queries, and there will be no
>>> (legal) way of working out which corresponds to which.
>> 
>> That would be a symptom of poor RDF/OWL usage, though. Assertions in RDF
>> are not supposed to be local-context-sensitive in the way you seem to be
>> assuming. So for example it would be a mistake to simply assert, in the
>> w3c page, that Tim's status WAS Director. It ought to say that a
>> relationship holds between him and the entity he is the Director of, i.e.
>> the W3C; so that this stays true even when it is moved somewhere else on
>> the Web. In fact, I suggest that as a basic, fundamental principle of any
>> 'web logic' is that assertions in it should have the same meaning wherever
>> they occur on the Web (see for example
>> http://www.ihmc.us:16080/users/phayes/IKL/GUIDE/GUIDE.html#LogicForInt)
>> 
>>> So I can infer that the person http://www.w3.org/People/Berners-Lee/card#i
>>> is a Professor at MIT, or a Senior Research Scientist at W3C, or Director
>>> at
>>> Southampton, none of which we consider true.
>>> (Of course, this was the intention of the sameAs assertion.)
>>> 
>>> I suggest that this is a bad state of affairs
>> 
>> It would be, yes, but it should not arise if the RDF is written properly.
>> 
>>> , and applies to any NIR, not
>>> just people.
>> 
>> It applies to any R, I or NI. Its really nothing to do with the nature of
>> the thing named.
>> 
>> Pat Hayes
>> -- 
>> ---------------------------------------------------------------------
>> IHMC (850)434 8903 or (650)494 3973   home
>> 40 South Alcaniz St. (850)202 4416   office
>> Pensacola (850)202 4440   fax
>> FL 32502 (850)291 0667    cell
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>
Received on Tuesday, 31 July 2007 19:27:34 UTC