Re: [Linking-open-data] Reasoning over Web Data

From: Richard Cyganiak <richard@cyganiak.de> · Date: Wed, 1 Aug 2007 15:00:33 +0200

Hugh,

On 31 Jul 2007, at 21:25, Hugh Glaser wrote:
>> Within applications like the
>> DISCO Semantic Web browser or the Semantic Web Client Library, we  
>> use the
>> Named Graphs data model to represent RDF data that has been  
>> retrieved from
>> the Web. This allows us to clearly keep track where information  
>> came from
>> and which facts are associated with each other.
> Yes, it is possible to distinguish.
> This begs the question: if I need to use Named Graphs for the  
> simplest query
> about Tim's three roles, effectively bypassing the sameAs  
> inference, was
> sameAs the right thing to use?

In Disco, we use Named Graphs for *storage* of Web-retrieved  
information. This doesn't mean we have to “use them for the simplest  
query” or that we're bypassing sameAs references.

In fact, when Disco retrieves information from the Named Graph store  
for presentation to the user, it works on a merged view of all the  
Named Graphs, so essentially it sees exactly what you would see if  
you just threw everything into One Big Model.

But having Named Graphs behind this merged view means that we can ask  
the store where the statement came from, and can retrieve additional  
metadata about the source (both metadata published by the source  
itself, and metadata about the dereferencing process).

> Trust is a big issue (and especially motivates Named graphs), but I  
> don't
> think it illuminates this case.
> I am not describing a situation where I am throwing lots of RDF into a
> triplestore. The situation is that I want to do some querying, say  
> about
> people at W3C. I find Tim's URI, and retrieve the RDF, and his  
> associated
> sameAs URIs -> RDF, and put it all into a triplestore cache, so  
> that I can
> conveniently do some work on it.
> Since it all starts from Tim's page, I don't see there is much of a  
> trust
> issue here either.
> This is a straightforward bit of SW business.

Named Graphs help not just with trust but also with provenance, which  
is highly relevant in the case you describe.

>> I also think that it would not be harmful if OWL tutorials and  
>> best practice
>> guides would state this fact more clearly so that they do not  
>> raise wrong
>> expectations.
> That would be good.
> So what is the recommended best practice?
> Either on the querying side, to use Named Graphs model all the  
> time; or on
> the representation side, as I said in my original message (which  
> seemed to
> get lost off the end of Pat's reply):
>> This means that the ontologies have to be much more carefully  
>> constructed
>> than they appear to be at present, taking cognisance of the  
>> consequences of
>> others making such sameAs statements, in our open world.

That's certainly good advice, but not very actionable. Any specific  
ideas on what people should do when constructing ontologies?

Cheers,
Richard

>
> Hugh
>
>>
>> In the light of the current Semantic Web layer cake discussion, I  
>> have been
>> wondering for years why the trust layer is up that far in the  
>> layer cake. It
>> is obvious that you will only get junk if you try to reason over  
>> data from
>> the web before applying some heuristics to determine  
>> trustworthiness and
>> filter out low quality information. Therefore, I think the trust  
>> layer
>> should be positioned lower in the cake. Maybe below Unifying  
>> Logic? If this
>> is the point where things change from representation to reasoning.
>>
>> Cheers
>>
>> Chris
>>
>>
>> --
>> Chris Bizer
>> Freie Universität Berlin
>> +49 30 838 54057
>> chris@bizer.de
>> www.bizer.de
>> ----- Original Message -----
>> From: "Pat Hayes" <phayes@ihmc.us>
>> To: "Hugh Glaser" <hg@ecs.soton.ac.uk>
>> Cc: "Tim Berners-Lee" <timbl@w3.org>; "Chris Bizer" <chris@bizer.de>;
>> <www-tag@w3.org>; <semantic-web@w3.org>; "Linking Open Data"
>> <linking-open-data@simile.mit.edu>
>> Sent: Monday, July 30, 2007 9:49 PM
>> Subject: Re: Terminology Question concerning Web Architecture and  
>> Linked
>> Data
>>
>>
>>>
>>>> I am trying hard to keep up (I suspect like many), and was  
>>>> hoping someone
>>>> would address a concern I have; forgive me if I missed it  
>>>> somewhere in the
>>>> discussion.
>>>> I have hung this off this message from Tim, which seems the most  
>>>> relevant.
>>>> And congratulations on the Linked Data Tutorial - a really useful
>>>> document.
>>>>
>>>> So here we go:
>>>>
>>>> On 25/7/07 14:35, "Tim Berners-Lee" <timbl@w3.org> wrote:
>>>>
>>>>>
>>>>>  (Going back to the original question, as it is much simpler  
>>>>> than much
>>>>>  which follows!)
>>>>>
>>>>>  On 2007-07 -07, at 08:43, Chris Bizer wrote:
>>>>>
>>>>>
>>>>>>  Question 3: Depending on the answer to question 1, is it  
>>>>>> correct to
>>>>>>  use owl:sameAs [6] to state that http://www.w3.org/People/ 
>>>>>> Berners-
>>>>>>  Lee/card#i and http://dbpedia.org/resource/Tim_Berners-Lee  
>>>>>> refer to
>>>>>>  the same thing as it is done in Tim's profile.
>>>>>
>>>>>  Yes.
>>>>>
>>>> So Tim absolutely right.
>>>> This is an entirely logical thing to say.
>>>> These two NIRs (Non-Information Resources) should be considered  
>>>> the same.
>>>
>>> (Aside) I wish folk would not say 'two' when there is only one.  
>>> Two NIRs
>>> should never be considered the same: rather, two names may refer  
>>> to the
>>> same, single, NIR.
> Thanks.
> Sorry.
>>>
>>>> But it is important to consider how this statement will be used,  
>>>> and worry
>>>> whether there may be unexpected consequences.
>>>> As we now know, the URIs should be resolvable, and so  
>>>> interesting Semantic
>>>> Web applications will use the URI to get the Description (or  
>>>> whatever we
>>>> call it), probably going via a 303.
>>>> So my SW app will get the RDF of them both, and add it to my  
>>>> triplestore,
>>>> along with all the other linked data.
>>>>
>>>> Tim, as often, is a good example.
>>>> Consider the places Tim works (W3C, MIT, Southampton, I guess).
>>>> It is likely that each will publish RDF about him, hopefully  
>>>> using an
>>>> agreed
>>>> ontology (one day!).
>>>> Now comes the rub.
>>>> If you put all this in one triplestore, with the owl:sameAs  
>>>> assertions,
>>>> then
>>>> it will not be possible to distinguish where facts came from, or  
>>>> rather
>>>> which facts are associated with which others.
>>>
>>> Whoa, careful. It will probably will be >>possible<< to  
>>> distinguish this,
>>> in fact. It might be that unwanted consequences are entailed by the
>>> combination of the various RDF graphs and the sameAs, but a careful
>>> querying process should be able to determine which of the various  
>>> triples
>>> are present and even whether they are linked. One simple way is  
>>> to query
>>> under sub-OWL entailment, for example, which can be little more  
>>> than a
>>> direct syntactic matching process (see SPARQL).
>>>
>>>> Perhaps 3 job titles, 3 telephone numbers and 3 institution  
>>>> addresses will
>>>> be returned from the appropriate SPARQL queries, and there will  
>>>> be no
>>>> (legal) way of working out which corresponds to which.
>>>
>>> That would be a symptom of poor RDF/OWL usage, though. Assertions  
>>> in RDF
>>> are not supposed to be local-context-sensitive in the way you  
>>> seem to be
>>> assuming. So for example it would be a mistake to simply assert,  
>>> in the
>>> w3c page, that Tim's status WAS Director. It ought to say that a
>>> relationship holds between him and the entity he is the Director  
>>> of, i.e.
>>> the W3C; so that this stays true even when it is moved somewhere  
>>> else on
>>> the Web. In fact, I suggest that as a basic, fundamental  
>>> principle of any
>>> 'web logic' is that assertions in it should have the same meaning  
>>> wherever
>>> they occur on the Web (see for example
>>> http://www.ihmc.us:16080/users/phayes/IKL/GUIDE/ 
>>> GUIDE.html#LogicForInt)
>>>
>>>> So I can infer that the person http://www.w3.org/People/Berners- 
>>>> Lee/card#i
>>>> is a Professor at MIT, or a Senior Research Scientist at W3C, or  
>>>> Director
>>>> at
>>>> Southampton, none of which we consider true.
>>>> (Of course, this was the intention of the sameAs assertion.)
>>>>
>>>> I suggest that this is a bad state of affairs
>>>
>>> It would be, yes, but it should not arise if the RDF is written  
>>> properly.
>>>
>>>> , and applies to any NIR, not
>>>> just people.
>>>
>>> It applies to any R, I or NI. Its really nothing to do with the  
>>> nature of
>>> the thing named.
>>>
>>> Pat Hayes
>>> -- 
>>> -------------------------------------------------------------------- 
>>> -
>>> IHMC (850)434 8903 or (650)494 3973   home
>>> 40 South Alcaniz St. (850)202 4416   office
>>> Pensacola (850)202 4440   fax
>>> FL 32502 (850)291 0667    cell
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>
>>>
>>
>
>
> _______________________________________________
> Linking-open-data mailing list
> Linking-open-data@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/linking-open-data
>