Re: comments on the uri note from Peter Ansell on 2007-11-05 (public-semweb-lifesci@w3.org from November 2007)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Tue, 6 Nov 2007 08:27:43 +1000
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: public-semweb-lifesci@w3.org, Michel_Dumontier <Michel_Dumontier@carleton.ca>, naty.vr@gmail.com, p.roe@qut.edu.au, j.hogan@qut.edu.au
Message-ID: <a1be7e0e0711051427ueb249c7s242a50662c5042d6@mail.gmail.com>
On 06/11/2007, Booth, David (HP Software - Boston) <dbooth@hp.com> wrote:
> Peter,
>
> > From: Peter Ansell
> > [ . .. . ]
> > Suppose two people come up with slightly different, but mutually
> > useful, definitions at the same time and, before an authority has
> > declared them to be the same, want to use both of the definitions  in
> > queries, and advertise them so they can be used by other users. In my
> > view, they would simply add an owl:sameAs statement to their
> > definitions to give users the impression that they were infact agreed
> > to be the same subject resource.
>
> That's a good example to examine.  To make this more concrete, let's assume these two people mint different URIs U1 and U2 as names for resources R1 and R2, and assume that even though R1 and R2 are very similar, they are slightly different, i.e., there exists a predicate P such that P(R1) is true but P(R2) is false.

Okay, but it was not my intention to be able to know, or need to know,
what predicate P actually is. As long as you don't know what it is you
can't be accountable for its truth or otherwise. Logic only gets you
places if you know everything is true and consistent beforehand, and
science doesn't have this luxury. It is forced to deal with things
that it doesn't completely know how to describe, and as such I don't
think there are many scientific issues which will come down to as
specific issue as this. In other areas such as mathematics where the
truth is all in the statements by definition one may find such a
predicate but they are also then free to manipulate the definitional
basis to fit their purpose. If scientists did this they might get nice
results which have no meaning in the real world.

Hence, if a scientist wants to define a concept as exactly the same as
another to their knowledge, they should be free to do so, keeping in
mind that they are free to retract their statement in the future
because they control the namespace that they are publishing on. There
is not a huge difference between that and stating that two animals are
part of the same species.

> For many applications, the difference between R1 and R2 may not matter.  Thus, for many applications it may be perfectly fine and very useful to assert U1 owl:sameAs U2, even though it is not entirely true.  So the problem is that if the assertion "U1 owl:sameAs U2" is made a part of the declaration of U1, then U1 can never be used in situations where the difference between R1 and R2 *does* matter.  Thus, it limits the reusability of U1.

I disagree that it limits the reusability of U1 in any special way.
What is the difference between sameAs for instance as utilising a
relationship which defines a protein as being produced from a gene. If
you are not completely sure that it is always going to be produced
from that gene should you omit the statement just incase it causes
someones reasoning to fail in a specific situation? If you are unsure
about so many science issues anyway, why should you be any more sure
about anything you say about them using information systems
terminology.

> What can the owner of U1 do instead?   The declaration of U1 can provide a seeAlso to a separate document -- not a part of the URI declaration for U1 -- that either asserts U1 owl:sameAs U2 (and perhaps explains that this isn't strictly true), or indicates more specifically the relationship between R1 and R2, such as something like U1 skos:broader U2.

How do you explain that U1 and U2 are not "strictly" equal if you do
not know where they are going to be used? And does reality fit into
nice schemes where you can decide that one specification is "broader"
than another, if they are not actually nice subsets of each other in
some way. Broader to me implies that if you can use U2 in a statement
you can also use U1 as U2 is a simple subset of U1.

Personally I don't believe that you can break down scientific concepts
into their parts without losing information that only held true when
you were considering the original whole concept.

> >
> > The concept of one-uri-to-rule-them-all portrays the actual situation
> > as being very simple, and relies on a single authority for each
> > subject area who reviews definitions before allowing them to be used.
>
> I'm not entirely sure I know what you mean by "one-uri-to-rule-them-all", but if you are talking about the idea that (a) URIs should be reused, and (b) usage should be consistent with the URI's declaration, then this does *not* rely on a single authority for each subject area.  It relies on a single "authority" for each URI, but that "authority" is only the authority to say what resource the URI denotes.  It has nothing to do with whether someone is an authority in a subject area.  See the WebArch discussion of URI ownership:
> http://www.w3.org/TR/webarch/#uri-ownership
> and my document about URI declaration:
> http://dbooth.org/2007/uri-decl/
>
> > If one wishes to use this approach then it should be okay, but the
> > note should allow for the fact that people will want to change
> > definitions as they see fit, even feeling strongly enough about their
> > changes or extensions to refer to them as being of the level of an
> > authority. Its a dangerous world when people can do anything without
> > an authority reviewing everything.
> >
> > > A seeAlso does not constitute an assertion that you should believe
> > > what's at the other end. That would be reserved for something
> > > stronger like owl:imports (if I understand it correctly).
> >
> > Why would you implicitly not want to believe what is on the other end
> > of a seeAlso statement? If the author has thought to include it, then
> > they think there is some value in it. According to the plaintext
> > comment for seeAlso, it declares that "Further information about the
> > subject resource." can be found at the target resource. It really
> > depends on whether you are interested in further information, or
> > whether you want to restrict yourself to the original "defining" set
> > of statements.
>
> The reason you might not want to believe what is on the other end of a seeAlso statement is because it may conflict with some other statements that you wish to use.  Suppose the declaration of URI U says to seeAlso documents documents, A, B and C.  Each of these may have very useful information, and may not (individually) conflict with the declaration of U.  But if A and B are used together, there may be a contradiction.  And even though C does not currently conflict with anything, at some later point a new document D might be pulished such that if C and C were used together there would be a contradiction.
>
> The point is that even though these documents may contain useful information, they should be separable from the declaration of U in order to maximize the reuse potential of U.

My problem is that it is not realistic to take the concept out of its
context just because you are trying to fix it down to a basis. How
many actual publishers publish bare declarations about the concepts
they are representing without including in exactly the same place
information about its relationship to other concepts. They do not
distinguish between the basis set and other information statements, so
why should they require anyone who utilises their URI's to represent
these things to distinguish based on these things. It is nice that
someone is trying to develop a terminology to describe the "things" on
the "internets", but in trying to develop an overall statement, they
may have assumptions which don't fit both with an external reality and
with the way the "internets" actually work currently.

If you really have an issue with slight differences between statements
then it would be best for you not to use the identifier, and resolve
the issue manually using your own namespace. The beauty of RDF is that
you can do this without confusing people.

Do you have any examples of where concepts are defined with so many
contextual statements that an unintelligent computer reasoning program
has an issue with them when they are all included, and a scientific
rationale for why a publisher would see the issue the same way you do
and change their publishing methods to discriminate between the
statements for you.

Personally, I do not distinguish between "what you should believe" out
of an official statement, and "what is published". But I guess the
seeAlso/sameAs/Imports/isDefinedBy are all creations of information
systems researchers who are not likely to have considered them to be
applied to realistic scientific applications with non-"information
resources" where understanding about the real world is constantly
evolving in response to new research and what is known at one instant
in time is not likely to be correct at the next instance

It is unrealistic that you should both expect to be able to reason
broadly based on RDF statements and that you will get realistic
results even if you are able to reason broadly. The best I think you
can hope for is to find areas to investigate further, at least until
seeAlso/sameAs/Imports/isDefinedBy have clearer meaning for
scientists.

Personally I am planning to use sameAs to relate locally published
URI's to world database items, and it I guess is up to others to
consider my definition and accept or reject it but it is not something
I will spend hours worrying about the smallest set of statements that
is likely to fit a given scenario for another researcher at some point
now or in the future. The point I think is to build an information
web, not a set of isolated nodes. Universal exploration is much less
sensitive to the truth or otherwise of individual statements than
universal reasoning. This is possibly the reason I have not used many
SPARQL queries so far.

I have designed the majority of the scientific bases for my
explorations using my own algorithms rather than utilise and trust
published ontologies to be consistent with actual statements. :)

Peter
Received on Monday, 5 November 2007 22:27:53 UTC