Re: [Linking-open-data] Reasoning over Web Data

From: Hugh Glaser <hg@ecs.soton.ac.uk> · Date: Fri, 03 Aug 2007 11:56:54 +0100

Thanks Richard,
On 1/8/07 14:00, "Richard Cyganiak" <richard@cyganiak.de> wrote:
> 
> Hugh,
> 
> On 31 Jul 2007, at 21:25, Hugh Glaser wrote:
>>> Within applications like the
>>> DISCO Semantic Web browser or the Semantic Web Client Library, we
>>> use the
>>> Named Graphs data model to represent RDF data that has been
>>> retrieved from
>>> the Web. This allows us to clearly keep track where information
>>> came from
>>> and which facts are associated with each other.
>> Yes, it is possible to distinguish.
>> This begs the question: if I need to use Named Graphs for the
>> simplest query
>> about Tim's three roles, effectively bypassing the sameAs
>> inference, was
>> sameAs the right thing to use?
> 
> In Disco, we use Named Graphs for *storage* of Web-retrieved
> information. This doesn't mean we have to ³use them for the simplest
> query² or that we're bypassing sameAs references.
> 
> In fact, when Disco retrieves information from the Named Graph store
> for presentation to the user, it works on a merged view of all the
> Named Graphs, so essentially it sees exactly what you would see if
> you just threw everything into One Big Model.
> 
> But having Named Graphs behind this merged view means that we can ask
> the store where the statement came from, and can retrieve additional
> metadata about the source (both metadata published by the source
> itself, and metadata about the dereferencing process).
Absolutely. A way to process really important things such as trust,
provenance, clever apps, and probably a bunch of things we haven't thought
of yet.
> 
>> Trust is a big issue (and especially motivates Named graphs), but I
>> don't
>> think it illuminates this case.
>> I am not describing a situation where I am throwing lots of RDF into a
>> triplestore. The situation is that I want to do some querying, say
>> about
>> people at W3C. I find Tim's URI, and retrieve the RDF, and his
>> associated
>> sameAs URIs -> RDF, and put it all into a triplestore cache, so
>> that I can
>> conveniently do some work on it.
>> Since it all starts from Tim's page, I don't see there is much of a
>> trust
>> issue here either.
>> This is a straightforward bit of SW business.
> 
> Named Graphs help not just with trust but also with provenance, which
> is highly relevant in the case you describe.
> 
>>> I also think that it would not be harmful if OWL tutorials and
>>> best practice
>>> guides would state this fact more clearly so that they do not
>>> raise wrong
>>> expectations.
>> That would be good.
>> So what is the recommended best practice?
>> Either on the querying side, to use Named Graphs model all the
>> time; or on
>> the representation side, as I said in my original message (which
>> seemed to
>> get lost off the end of Pat's reply):
>>> This means that the ontologies have to be much more carefully
>>> constructed
>>> than they appear to be at present, taking cognisance of the
>>> consequences of
>>> others making such sameAs statements, in our open world.
> 
> That's certainly good advice, but not very actionable. Any specific
> ideas on what people should do when constructing ontologies?
Ay, there's the rub. To be the sameAs, or not to be.

Let me go back a bit and try to motivate a bit more.
The example I used was Tim, and the possible problem that his job titles
might be confused with the places he works.
I could suggest lots of similar examples about people; if my RDF at
Southampton lists me as teaching a unit, and I spend some time visiting
another institution, will a KB allow a correct inference that I teach the
unit at the other institution if I sameAs the URIs from each? If someone
looks me up in the KB at the visiting institution, will they realise that
the telephone number the KB gives for me is the one at my home institution
(especially if that is the only one it has)?
The same problems can be found for most types of things.
Pat rightly pointed out that Director should be director of something, which
is true, although ordinary job titles such as Professor and Senior Research
Scientist are usually just associated with a person. So it should be perhaps
that the rdf for Tim (I am using his name for the NIR) has a role, and then
the confusion is avoided because the role is associated with the title,
address, etc..
As I said originally, we could introduce indirection.
(Nick Gibbins is exploring this stuff at his id.ecs.soton.ac.uk site.)
But this means introducing the complexity of lots of roles everywhere, and
in the end, I hazard a guess that most people writing queries will just
consider them as transparent, and resurrect the problem by ignoring such
indirections.
It would be nice to think that RDF will be that carefully constructed, and
queries just as carefully, but outsiders already consider it pretty
complicated (for some strange reason), and making it less intuitive is not
really desirable.
In fact, we already have a significant legacy of ontologies, and associated
RDF, that would need rebuilding.

It should also be of some concern that if we are discussing the difficulty
of understanding this problem, then the wider community might well have
difficulty understanding and using guidlines. The nice thing about the
Linked Data Tutorial is that I can give it to most people and they can just
follow it, and it makes sense.

So having got rather more controversial than I intended, I'll got the whole
unit of distance.

The proposal to use sameAs so widely is the problem. This is because it is
such a strong assertion, and without stepping outside the standard
framework, I am not allowed to distinguish URIs to the same NIR.
sameAs is very important: if I get a chemical sample on my bench and get
some data about, I need a label for it. I will eventually work out what it
is, at which point I may want to sameAs. If someone refers to something
about a paper's (unknown) author, it would be good to assert sameAs when you
eventually find out another URI for the author that includes the name, etc..

In the Tim case, it is probably good to say
SameAs between 
"http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee"
and
http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007"
But I am not at all sure that the other uses are helpful.

We really need another way of making the statements Tim wants to.
References to a single NIR (thanks Pat) can be identified, but can still be
distinguished without stepping outside the framework - and the writers of
queries are allowed the flexibility to use the information if they want.

So to go back to your original question: we now don't need further ideas on
ontology construction to avoid this; we keep sameAs for the strongest
equivalence, and use something else for weaker cases.

I have a sense of lighting the blue touch paper and retiring (this is the
instruction on English fireworks to set them off), as I am off on holiday
now.
Sorry.
Hugh

-- 
Hugh Glaser,  Reader
              Dependable Systems & Software Engineering
              School of Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 (0)23 8059 3670, Fax: +44 (0)23 8059 3045
Mobile: +44 (0)78 9422 3822, Home: +44 (0)23 8061 5652
http://www.ecs.soton.ac.uk/~hg/

> 
> Cheers,
> Richard
> 
> 
> 
>> 
>> Hugh
>> 
>>> 
>>> In the light of the current Semantic Web layer cake discussion, I
>>> have been
>>> wondering for years why the trust layer is up that far in the
>>> layer cake. It
>>> is obvious that you will only get junk if you try to reason over
>>> data from
>>> the web before applying some heuristics to determine
>>> trustworthiness and
>>> filter out low quality information. Therefore, I think the trust
>>> layer
>>> should be positioned lower in the cake. Maybe below Unifying
>>> Logic? If this
>>> is the point where things change from representation to reasoning.
>>> 
>>> Cheers
>>> 
>>> Chris
>>> 
>>> 
>>> --
>>> Chris Bizer
>>> Freie Universität Berlin
>>> +49 30 838 54057
>>> chris@bizer.de
>>> www.bizer.de
>>> ----- Original Message -----
>>> From: "Pat Hayes" <phayes@ihmc.us>
>>> To: "Hugh Glaser" <hg@ecs.soton.ac.uk>
>>> Cc: "Tim Berners-Lee" <timbl@w3.org>; "Chris Bizer" <chris@bizer.de>;
>>> <www-tag@w3.org>; <semantic-web@w3.org>; "Linking Open Data"
>>> <linking-open-data@simile.mit.edu>
>>> Sent: Monday, July 30, 2007 9:49 PM
>>> Subject: Re: Terminology Question concerning Web Architecture and
>>> Linked
>>> Data
>>> 
>>> 
>>>> 
>>>>> I am trying hard to keep up (I suspect like many), and was
>>>>> hoping someone
>>>>> would address a concern I have; forgive me if I missed it
>>>>> somewhere in the
>>>>> discussion.
>>>>> I have hung this off this message from Tim, which seems the most
>>>>> relevant.
>>>>> And congratulations on the Linked Data Tutorial - a really useful
>>>>> document.
>>>>> 
>>>>> So here we go:
>>>>> 
>>>>> On 25/7/07 14:35, "Tim Berners-Lee" <timbl@w3.org> wrote:
>>>>> 
>>>>>> 
>>>>>>  (Going back to the original question, as it is much simpler
>>>>>> than much
>>>>>>  which follows!)
>>>>>> 
>>>>>>  On 2007-07 -07, at 08:43, Chris Bizer wrote:
>>>>>> 
>>>>>> 
>>>>>>>  Question 3: Depending on the answer to question 1, is it
>>>>>>> correct to
>>>>>>>  use owl:sameAs [6] to state that http://www.w3.org/People/
>>>>>>> Berners-
>>>>>>>  Lee/card#i and http://dbpedia.org/resource/Tim_Berners-Lee
>>>>>>> refer to
>>>>>>>  the same thing as it is done in Tim's profile.
>>>>>> 
>>>>>>  Yes.
>>>>>> 
>>>>> So Tim absolutely right.
>>>>> This is an entirely logical thing to say.
>>>>> These two NIRs (Non-Information Resources) should be considered
>>>>> the same.
>>>> 
>>>> (Aside) I wish folk would not say 'two' when there is only one.
>>>> Two NIRs
>>>> should never be considered the same: rather, two names may refer
>>>> to the
>>>> same, single, NIR.
>> Thanks.
>> Sorry.
>>>> 
>>>>> But it is important to consider how this statement will be used,
>>>>> and worry
>>>>> whether there may be unexpected consequences.
>>>>> As we now know, the URIs should be resolvable, and so
>>>>> interesting Semantic
>>>>> Web applications will use the URI to get the Description (or
>>>>> whatever we
>>>>> call it), probably going via a 303.
>>>>> So my SW app will get the RDF of them both, and add it to my
>>>>> triplestore,
>>>>> along with all the other linked data.
>>>>> 
>>>>> Tim, as often, is a good example.
>>>>> Consider the places Tim works (W3C, MIT, Southampton, I guess).
>>>>> It is likely that each will publish RDF about him, hopefully
>>>>> using an
>>>>> agreed
>>>>> ontology (one day!).
>>>>> Now comes the rub.
>>>>> If you put all this in one triplestore, with the owl:sameAs
>>>>> assertions,
>>>>> then
>>>>> it will not be possible to distinguish where facts came from, or
>>>>> rather
>>>>> which facts are associated with which others.
>>>> 
>>>> Whoa, careful. It will probably will be >>possible<< to
>>>> distinguish this,
>>>> in fact. It might be that unwanted consequences are entailed by the
>>>> combination of the various RDF graphs and the sameAs, but a careful
>>>> querying process should be able to determine which of the various
>>>> triples
>>>> are present and even whether they are linked. One simple way is
>>>> to query
>>>> under sub-OWL entailment, for example, which can be little more
>>>> than a
>>>> direct syntactic matching process (see SPARQL).
>>>> 
>>>>> Perhaps 3 job titles, 3 telephone numbers and 3 institution
>>>>> addresses will
>>>>> be returned from the appropriate SPARQL queries, and there will
>>>>> be no
>>>>> (legal) way of working out which corresponds to which.
>>>> 
>>>> That would be a symptom of poor RDF/OWL usage, though. Assertions
>>>> in RDF
>>>> are not supposed to be local-context-sensitive in the way you
>>>> seem to be
>>>> assuming. So for example it would be a mistake to simply assert,
>>>> in the
>>>> w3c page, that Tim's status WAS Director. It ought to say that a
>>>> relationship holds between him and the entity he is the Director
>>>> of, i.e.
>>>> the W3C; so that this stays true even when it is moved somewhere
>>>> else on
>>>> the Web. In fact, I suggest that as a basic, fundamental
>>>> principle of any
>>>> 'web logic' is that assertions in it should have the same meaning
>>>> wherever
>>>> they occur on the Web (see for example
>>>> http://www.ihmc.us:16080/users/phayes/IKL/GUIDE/
>>>> GUIDE.html#LogicForInt)
>>>> 
>>>>> So I can infer that the person http://www.w3.org/People/Berners-
>>>>> Lee/card#i
>>>>> is a Professor at MIT, or a Senior Research Scientist at W3C, or
>>>>> Director
>>>>> at
>>>>> Southampton, none of which we consider true.
>>>>> (Of course, this was the intention of the sameAs assertion.)
>>>>> 
>>>>> I suggest that this is a bad state of affairs
>>>> 
>>>> It would be, yes, but it should not arise if the RDF is written
>>>> properly.
>>>> 
>>>>> , and applies to any NIR, not
>>>>> just people.
>>>> 
>>>> It applies to any R, I or NI. Its really nothing to do with the
>>>> nature of
>>>> the thing named.
>>>> 
>>>> Pat Hayes
>>>> -- 
>>>> --------------------------------------------------------------------
>>>> -
>>>> IHMC (850)434 8903 or (650)494 3973   home
>>>> 40 South Alcaniz St. (850)202 4416   office
>>>> Pensacola (850)202 4440   fax
>>>> FL 32502 (850)291 0667    cell
>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>> 
>>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> Linking-open-data mailing list
>> Linking-open-data@simile.mit.edu
>> http://simile.mit.edu/mailman/listinfo/linking-open-data
>> 
> 
>