Re: comments on the uri note

On 05/11/2007, Jonathan Rees <jar@creativecommons.org> wrote:
>
> Dear HCLS list - I haven't put the draft out for general review, but
> am planning to do so tomorrow.
> My reason for  waiting is past experience of "draft fatigue" - most
> people will only read one draft of something.  If you're one of those
> people please wait a while for the dust to settle.
>
> On Nov 3, 2007, at 3:02 PM, Michel_Dumontier wrote:
>
> > Hi all,
> >   I read the latest URI note [1], and here are some comments:
> >
> > [1] http://sw.neurocommons.org/2007/uri-note/
>
> Future reviewers - always please include the draft number (35, 36,
> etc.).
>
> Thanks for taking the time to look at this.
>
> >  "A usage spec for a name is simply a graph that is designated as one
> > that specifies when the name should and shouldn't be used"
> >
> > Given that RDF semantics are open world, and RDF lacks the formal
> > vocabulary for negation or universal quantifiers, I don't see how one
> > can constrain usage, as no inconsistency can result from the
> > addition of
> > new knowledge. The example that is provided is not constraining, but
> > rather states what we know about that particular entity, at a certain
> > time, presumably from a certain location on the internet.
>
> I thought it was clear that I was not talking only logic.  You can
> specify all sorts of unmodeled things in natural language, and this
> language can go in an rdfs:comment. I'll try to make this more clear.

Are there any concrete examples of usage specs out there? I take as my
general usage spec within the bio2rdf framework to be the use of
rdfs:type as a triple on the graph which contains all the other
information about the item. True, the resource may be defined on an
ontology somewhere, which may be more informative. But will everyone
want to make up an ontology for their items, a process which is not
easy to understand until you know pretty much everything about how OWL
works.

> > A major concern I have with the note is that it essentially says that
> > only the naming authority can make "defining" statements about some
> > URI.
> > Such an approach would severely hinder people from reusing URIs, as
> > they
> > may wish to make additional statements that are undoubtedly not
> > covered
> > by the authority's definition. Such advocacy would simply lead
> > people to
> > mint their own URIs, leading to heavy fragmentation of the semantic
> > web,
> > in which only our knowledge about something might be limited due to
> > "see
> > also" links between instances.
>
> "Additional statements" are not defining and are not meant to impose
> usage constraints on ALL users of the name, even if they could. They
> are merely about, just as Austen's novel Persuasion is about
> persuasion and uses the word (name) 'persuasion', but doesn't define
> 'persuasion'. That doesn't mean the additional statements aren't
> believed; they may or may not be incorporated into any given theory
> (logical or otherwise).

Utlising the information contained within the book could provide
useful examples of use which are generally included in definitions.
Dictionaries do this all the time. Of course, the author still selects
the examples for use, which is not the case in this scenario as the
users are the ones selecting what they wish to include in their
"definition".

I can see why people would want to have set down queries which can be
run again and again with the same results. But in reality, things are
going to change, and they are rarely going to be completely consistent
with any one ontology (at least until people in general become
skillful at formal ontology specification). Combining those two
elements of what is actually happening, with the formal logic approach
must inevitably reduce the recall rate, although hopefully increasing
the accuracy rate.

The concept that definitions are merely commented on, rather than
transformed is a philosophical one, but it does translate to what the
majority of humans naturally do. The majority of people do not pick up
a name or concept and then just attach observations to it. They use
the observations to change the actual definition to both suit their
needs and to fit what they actually see compared to what is said in
the dictionary about the word. Dictionaries are by definition out of
date, as they reflect usage rather than create it.

> You can always say what you like about any defined term; doing so
> does not change the definition.

In reality you are talking about the same thing. Just because the
original defining author does not include your information doesn't
mean people do not want to know about your extension to the
definition, and will restrict their queries to the original, "just
because". The idea of the semantic web is to integrate different
sources of information and transform definitions based on external
ideas about what the item really is. If the idea of the semantic web
in the note is just how to interact with multiple solitary agents on
the web and use their information verbatim it should take a leaf out
of the B2B (Business-2-Business) books. They have been doing things as
simple as that for years.

> Perhaps we don't agree about what is meant by "defining" (although
> I'm puzzled that you use that word because I don't use it in the
> draft), and I need to be clearer about that.
>
> > "The property rdfs:seeAlso specifies a resource that _might_ provide
> > additional information about the subject resource" [2]
> >
> > [2] http://www.w3.org/TR/2000/CR-rdf-schema-20000327/#s2.3.4
> >
> > Unless there is a stronger link between differently named resources,
> > such as owl:sameAs, it certainly can't be interpreted that they are
> > the
> > same, thus the statements will not be merged. However, if the resource
> > points to another document making statements about the URI, or
> > makes use
> > of owl:sameAs,  this will lead to the merging of statements that might
> > go beyond the original "definitions" of any one authority.
>
> I wish I could understand what you're saying. Can you give a concrete
> example?

Suppose two people come up with slightly different, but mutually
useful, definitions at the same time and, before an authority has
declared them to be the same, want to use both of the definitions  in
queries, and advertise them so they can be used by other users. In my
view, they would simply add an owl:sameAs statement to their
definitions to give users the impression that they were infact agreed
to be the same subject resource.

The concept of one-uri-to-rule-them-all portrays the actual situation
as being very simple, and relies on a single authority for each
subject area who reviews definitions before allowing them to be used.
If one wishes to use this approach then it should be okay, but the
note should allow for the fact that people will want to change
definitions as they see fit, even feeling strongly enough about their
changes or extensions to refer to them as being of the level of an
authority. Its a dangerous world when people can do anything without
an authority reviewing everything.

> A seeAlso does not constitute an assertion that you should believe
> what's at the other end. That would be reserved for something
> stronger like owl:imports (if I understand it correctly).

Why would you implicitly not want to believe what is on the other end
of a seeAlso statement? If the author has thought to include it, then
they think there is some value in it. According to the plaintext
comment for seeAlso, it declares that "Further information about the
subject resource." can be found at the target resource. It really
depends on whether you are interested in further information, or
whether you want to restrict yourself to the original "defining" set
of statements.

> > I don't believe that the statement "The declaration should be specific
> > enough to rule out incorrect usage, but not so specific that it
> > overcommits and fosters inconsistency or discourages reuse." is
> > possible
> > to adhere to.
> >
> > Here are things that I consider:
> > A Universal Resource Identifier is a string of characters that denotes
> > the name of some resource.
> Sorry, I am saying that the URI doesn't *denote* the name, it *is*
> the name. And it can name anything, not necessarily a resource.

Technically, if we are using an RDF perspective, that isn't correct. A
URI is a resource if it is declared to be a resource, as opposed to a
Literal. And resources are implied to be named objects which can have
properties. Technically, in real life usage of course, one will not
refer to something as "http://ncbi..../geneid:45000". They will refer
to its label, or its abbreviation, or possibly the numerical id. The
resource however is the same thing and it will confuse people to
enforce a policy where URI's do not have any real meaning past that.

In the perfect system people would never see URI's relating to
resources, they would only ever see URI's which appear in Literals.
Really, people are interested in the labels that are declared to
correspond to the resource URI's.

> > 1 - create a URI that is consistent with the corresponding
> > protocol. For
> > instance, HTTP URIs can only be composed of a certain set of
> > characters
> > defined by [some url], and LSID URIs have their own specification
> > [another url], etc, etc...
> I can add language to this effect if you think it's important.  For
> now I just say you have to follow URI syntax. I had thought that
> would imply protocol-specific syntax
>
> > 2 - reuse a URI if you believe that your use of that resource is
> > expected to be consistent with the original intent. In the absence of
> > expressive logics with negation, it will not be possible to
> > computationally check if the meaning is consistent.
> Correct. My proposal is to take the usage spec as defining, in
> natural language if necessary, the original intent.

Of course, this allows for a certain degree of interpretation by the
user of the information as to what the natural language definition
means, but that is all part of life.

> > 3 - you might consider minting a URI that is identical in intent, but
> > you like to track your contributions (provenance). In this case, you
> > make statements to your URI, and should consider using owl:sameAs to
> > indicate that the two resources should be considered equivalent.
> It sounds like we have very different ways of dealing with provenance
> and trust, and it would probably be profitable to talk about these
> differences. As you aren't the first to mention this approach, it's
> probably important that we try to figure out the differences are in
> underlying assumptions, since otherwise we'll just talk past each other.

My opinion on provenance is that if you are that worried about being
able to replicate your original work, you should cache all of the
relevant information yourself. It is not someone else's responsibility
to cache information just incase someone may have used it. If the
information changes in the future you cannot be charged retroactively
with anything that you didn't know about (legal principle uno,
although it is up to the government to decide when they with to use
retroactive action). Although that is slightly off track from the
sameAs definition, it is in reality quite similar. If you want to
demonstrate to an observer your individual method for reaching a
conclusion, then it is not okay to simply tell the world to stop while
you perform a demonstration. If you have taken two graphs to be
identical for the purposes of the demonstration, then you are allowed
to. It is an academic right to do so, within certain critical
boundaries.

if your audience do not trust one or both of your providers then it is
up to them to prove you wrong, it is no up to you to prejudge their
response.

> > Since a name isn't sufficient for understanding its meaning, we
> > suggest
> > that you augment every RDF/OWL resource with:
> > 1 - a concise human readable label using rdfs:label in the language of
> > choice
> yes. i am reluctant to impose even more requirements/advice, but
> label is probably warranted.

This kind of goes with the idea of not having bnode's in an easy to
understand subset of RDF. If something does not have a label, then how
are you reasonably expected to understand it. Understanding what you
are dealing with is completely necessary. Therefore I agree with some
inclusion of rdfs:label and/or dc:title in any recommendation.

> > 2 - a precise human readable definition using rdfs:comment in the
> > language of choice.
> yes, assuming there's any doubt that the logical definition doesn't
> capture all important information about usage (e.g. I think the
> pathology example doesn't need prose)
> > 3 - RDF statements that you believe to be universally true about that
> > resource
> yes. I would have to define "universally": I think it means (from a
> formal viewpoint) that every theory that uses the name is asked to
> take the usage spec as axiomatic.
> > 4 - or point to documents that make statements about that resource
> > using
> > rdfs:isDefinedBy.
> Fine, but if the statements are to be universally true why point to
> them? And what document-reference connective is strong enough to mean
> that the statements in the other document are to be axioms?
> owl:imports I guess.
>
> I don't like isDefinedBy because it relates the thing to the defining
> document. But the thing just is; it doesn't need defining. It is the
> name that needs to be defined. E.g. you could have two names for the
> same thing, defined in different ways. Definition, IMO, is
> extralogical since it quantifies over *all* conforming theories.
> Definition is closer to a kind of modularity (axiom sets that get
> incorporated into many theories) and is therefore meta-meta.

It is a tight philsophical boundary as to whether we are "defining" an
object when we assign properties to it or whether we are just defining
our name for it. If you are just trying to construct a body of
knowledge, then the name is your basic element, there is no allowance
for an actual object behind the name. However, if you are actually
trying to define properties about the real world, where multiple
people have come across the same real world object and "defined" it
using different names, then you should be allowed to reference the two
in a knowledge representation. If name is your basic element then you
aren't really allowing for a realist scientific perspective.

> > As an example, I built a prototype HTTP URI resolver for the entities
> > defined in my most current OWL ontologies:
> >
> > http://134.117.55.46:8181/Protein ,
> >
> > where 134.117.55.46:8181 will eventually be ontology.dumontierlab.com
> >
> > In this way, a human can see the implied meaning, and an agent can
> > follow other documents to determine what has been said about it (at
> > least within my own knowledge base).
>
> Tell me why you don't use the "eventual" term in the first place? If
> it's a resolution issue then something like resolution rules or a web
> proxy could help out here. If it's really a logical or provenance
> issue then I don't understand you (and I want to).
>
> When I want to make provisional assertions so that they can be tried
> out (for consistency, linking, application behavior, etc. I isolate
> them from other assertions by setting up a separate theory (graph,
> scope, ...). Entailment while I play around comes only from the
> provisional graph, and entailment in some other, better established,
> maybe shared theory is not threatened by the statements I make in the
> flakier place. It sounds like you're doing the same kind of thing
> but  within a single graph/theory and modulating names to simulate
> multiple graphs (you would say I resort to multiple graphs to
> simulate renaming). I would like to hear how these approaches are
> substantially different (not just syntactically or organizationally).
> If choice between these two approaches bleeds through into a naming
> discussion then that's a threat to sharing and we need to talk about it.
> Tell me I'm wrong...

If your provisional graph gives you information that you don't get
from the authoritative graph then why should you be limited by the
original graph? And why should you not personally publish your
research so others can both review and utilise it?

> > What remains lacking is a method by
> > which we can discover what other people have said about this resource.
> Excellent, I'd love to see a protocol for this. Currently I use google.

This would be nice. A simple crawler which just parses seeAlso,
sameAs, isDefinedBy etc from RDF and provides this as a "meta" search
engine.

> > That's why I'm fond of the http://lsrn.org (centralized) solution in
> > which multiple data providers can register as a resolver a given base
> > URI, so that people and agents can find out more about it (via HTML/
> > RDF
> > documents)[3].
> > Moreover, it allows third party data providers to
> > register a public identifier, and resolve it (in RDF documents)
> > prior to
> > the authority having to do so! Analogously, in the LSID protocol
> > (distributed), resolvers can register with the authority itself and
> > provide different information.
> Sounds a little like wikipedia. Where is the LSRN protocol
> documented? How do I add to the registry? Where is this LSID multi-
> resolver protocol documented? Where does the LSID resolver list come
> from, and how do I add to it?
> >
> > [3] http://lsrn.org/CAS:58-08-2
> >
> > Thus, I dislike anything that is "authoritative" or "monopolizing"
> > if it
> > handicaps URI reuse and precludes the discovery of additional
> > information about that resource.
> Absolutely. But I don't see anything in the apparatus of OWL and/or
> the semantic web that precludes you from looking anywhere for any
> kind of information. E.g. I could keep track of a set of my favorite
> SPARQL endpoints to consult when I have questions.  I think what
> you're saying is that you like having a nonauthoritative but central
> point of contact, which to me sounds like a service analogous to
> google (uncurated), DMOZ (curated), or wikipedia (contribution-
> based). If this is so I need to learn more about LSRN.
>
> The alternative to the "authority" of first published published
> dictionary definition (usage spec) is consensus process inducing what
> a term means. Although natural language is mostly defined by use, not
> prescription, I don't think this is viable for computational purposes
> (not necessarily limited to deduction) and it doesn't agree with my
> reading of how the scientific literature best develops. Yes, even
> when there are explicit definitions, there is overloading (others
> define the term differently), but good scholarship couples use of a
> technical term with a reference to the source that made the
> definition you're assuming. The URI in effect provides reference (to
> concept) and reference (to publication) in one module. The
> "authoritative" aspect here is just priority of publication, coupled
> with the uninteresting detail of collision avoidance through domain
> ownership. In a more recent version of the draft I've further
> downplayed the idea of "authority" - I now call it "minting
> authority" and explain that you lose control once you publish - your
> publication has to stand on its own unless a special contract is
> otherwise established that gives you some special ongoing
> relationship with the name. I hope we agree in this.

The publication may have to stand on its own, but publications are not
necessarily authorities. The current concept of authority is either a
curated central knowledge base, like uniprot, or a central uncurated
knowledge base which still has some standards for inclusion. They
won't go away, and they won't lose prestige because the semantic web
is moving faster than their processes (and physical resources)
possibly allow. Publications as a second level of authority, due to
peer review, will also not lose their authority. They will still be
authoritative in their own right as a much larger knowledge base than
the centralised, criteria based inclusion, knowledge bases will ever
get to. But past that is the possibility for having real-time research
disseminated to other scientists in order to extend their perspective
on issues. Normally this is just done within laboratories or tight
research groups, but if the semantic web really picks up, then it will
be possible without personal interaction with the user, unless some
clarification is needed. I see the registry as allowing for this level
of interaction, where people augment or relate to
higher-level-authoritative-definitions using their own data. Calling
this pseudo-science because each piece of data has not gone through a
peer-review may be a valid claim, but at least in this case it is
still verifiable who the data came from, and you will likely have a
local cache of the data if you are utilising it at this level, so you
can refer to it for provenance reasons.

I don't by the way agree that you "lose control" of something when you
publish it. It is still attributed to you, and you can still produce
corrections to it, particularly on the semantic web, where it is as
simple as a supercededBy statement to declare that the information is
out of date. Minting authority is still an authority in the sense that
databases will still be databases even if they publish their internal
identifiers.

If the URI note is only focused as people who wish to be authorities
at the curated database level then it is missing the point I think.

> Should one expect a referral to a registry from this document? I
> would have said this is out of scope.
> >
> > Just my two cents,
>
> Your cents matter to us.
>
>
>

Received on Sunday, 4 November 2007 23:24:54 UTC