Re: Ed's / Ian's View

On Tue, 2011-04-05 at 22:20 +0100, Nathan wrote:
> David Booth wrote:
> > You have almost touched on a key element: that there is an implicit
> > assumption that necessarily U1(x) != U2(x).  But this is a fallacy.
> > *Some* applications need to distinguish U1(x) from U2(x), but others
> > don't.  I.e., applications that have no need to distinguish U1(x) from
> > U2(x) will produce correct output even when U1(x) is assumed to be
> > owl:sameAs U2(x), just as applications that only need to distinguish
> > chocolate from vanilla may not care if different varieties of vanilla
> > are conflated.
> 
> x = http://en.wikipedia.org/wiki/Frankenstein
> q1: who is the author of x?
> q2: when was x published?
> U₁(x) = what you see when you deref x in a browser (document, page, ir)
> U₂(x) = the book, also known as "The Modern Prometheus"
> 
> Show me that you can answer those q1 and q2 about both U₁(x) and U₂(x) 
> assuming that all the statements needed are published using the dc 
> vocab, and associated with x as the subject.

That's an application that *needs* to distinguish between U1(x) and
U2(x).  But an application that cares only about a :lastHttpAccessTime
is unlikely to care about extraneous dc:author properties or other
properties that "conflate" the web page with the novel.  Note that you
and I are very biased, in that we commonly wish to differentiate between
them, so to us, an entity that has properties of both web page and novel
may seem nonsensical.  But there is nothing essential about this bias.

The point is that the need to distinguish between U1(x) and U2(x), or
between an IR and a toucan, or between different brands of vanilla
beans, or between anything and anything else, is *not* universal.  It is
application dependent.  Some applications need the distinction, others
do not.  

> 
> The whole reason we even need to say U₁(x) and U₂(x) in this 
> conversation is because x is being used to refer to two distinct things, 
> and I can assure you that U₁(x) != U₂(x) is a truth, not a fallacy, 
> regardless of whether that truth matters to some application or not. 
> (note: some, not all).

Sure, you and I may believe that U1(x) != U2(x), but what we believe is
*irrelevant*.  What matters is whether it makes an *observable*
difference to an application.  And that depends on the application.  For
some applications, there *is* no discernible difference between U1(x)
and U2(x), regardless of how much we may believe there should be.  For
them, the following RDF statements are about the *same* resource <x>:

 <x> dc:author "Mary Shelley" .
 <x> :accessibleVia "x" .

We need to focus on the things that make an *observable* difference 
to an application -- not on our own beliefs of absolute "truth".  
See myth #4: "Truth is absolute":
http://dbooth.org/2010/ambiguity/paper.html#myth4 

> 
> The day you can say the above for *all* applications (given this 
> scenario of some name referring to two things) rather than some, is the 
> day when it'll be an acceptable solution, until then it, like most of 
> the other proposals, only addresses *some* situations. We need a for all 
> here.

Agreed.  Once a pool has been contaminated it's very hard to clean it
up.  This is why in the draft of section 5.5 that I sent I included
other scenarios that discuss how to *prevent* that contamination, even
when you do want to merge graphs from different sources.



-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Received on Wednesday, 6 April 2011 13:09:45 UTC