Re: comments on the uri note from Jonathan Rees on 2007-11-04 (public-semweb-lifesci@w3.org from November 2007)

From: Jonathan Rees <jar@creativecommons.org>
Date: Sun, 4 Nov 2007 11:58:49 -0500
To: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Cc: naty.vr@gmail.com, public-semweb-lifesci@w3.org
Message-Id: <C584B78E-18A7-4743-A55D-2F3B5DB3A9A5@creativecommons.org>
Dear HCLS list - I haven't put the draft out for general review, but  
am planning to do so tomorrow.
My reason for  waiting is past experience of "draft fatigue" - most  
people will only read one draft of something.  If you're one of those  
people please wait a while for the dust to settle.

On Nov 3, 2007, at 3:02 PM, Michel_Dumontier wrote:

> Hi all,
>   I read the latest URI note [1], and here are some comments:
>
> [1] http://sw.neurocommons.org/2007/uri-note/

Future reviewers - always please include the draft number (35, 36,  
etc.).

Thanks for taking the time to look at this.

>  "A usage spec for a name is simply a graph that is designated as one
> that specifies when the name should and shouldn't be used"
>
> Given that RDF semantics are open world, and RDF lacks the formal
> vocabulary for negation or universal quantifiers, I don't see how one
> can constrain usage, as no inconsistency can result from the  
> addition of
> new knowledge. The example that is provided is not constraining, but
> rather states what we know about that particular entity, at a certain
> time, presumably from a certain location on the internet.

I thought it was clear that I was not talking only logic.  You can  
specify all sorts of unmodeled things in natural language, and this  
language can go in an rdfs:comment. I'll try to make this more clear.

> A major concern I have with the note is that it essentially says that
> only the naming authority can make "defining" statements about some  
> URI.
> Such an approach would severely hinder people from reusing URIs, as  
> they
> may wish to make additional statements that are undoubtedly not  
> covered
> by the authority's definition. Such advocacy would simply lead  
> people to
> mint their own URIs, leading to heavy fragmentation of the semantic  
> web,
> in which only our knowledge about something might be limited due to  
> "see
> also" links between instances.

"Additional statements" are not defining and are not meant to impose  
usage constraints on ALL users of the name, even if they could. They  
are merely about, just as Austen's novel Persuasion is about  
persuasion and uses the word (name) 'persuasion', but doesn't define  
'persuasion'. That doesn't mean the additional statements aren't  
believed; they may or may not be incorporated into any given theory  
(logical or otherwise).

You can always say what you like about any defined term; doing so  
does not change the definition.

Perhaps we don't agree about what is meant by "defining" (although  
I'm puzzled that you use that word because I don't use it in the  
draft), and I need to be clearer about that.

> "The property rdfs:seeAlso specifies a resource that _might_ provide
> additional information about the subject resource" [2]
>
> [2] http://www.w3.org/TR/2000/CR-rdf-schema-20000327/#s2.3.4
>
> Unless there is a stronger link between differently named resources,
> such as owl:sameAs, it certainly can't be interpreted that they are  
> the
> same, thus the statements will not be merged. However, if the resource
> points to another document making statements about the URI, or  
> makes use
> of owl:sameAs,  this will lead to the merging of statements that might
> go beyond the original "definitions" of any one authority.

I wish I could understand what you're saying. Can you give a concrete  
example?

A seeAlso does not constitute an assertion that you should believe  
what's at the other end. That would be reserved for something  
stronger like owl:imports (if I understand it correctly).

> I don't believe that the statement "The declaration should be specific
> enough to rule out incorrect usage, but not so specific that it
> overcommits and fosters inconsistency or discourages reuse." is  
> possible
> to adhere to.
>
> Here are things that I consider:
> A Universal Resource Identifier is a string of characters that denotes
> the name of some resource.
Sorry, I am saying that the URI doesn't *denote* the name, it *is*  
the name. And it can name anything, not necessarily a resource.

> 1 - create a URI that is consistent with the corresponding  
> protocol. For
> instance, HTTP URIs can only be composed of a certain set of  
> characters
> defined by [some url], and LSID URIs have their own specification
> [another url], etc, etc...
I can add language to this effect if you think it's important.  For  
now I just say you have to follow URI syntax. I had thought that  
would imply protocol-specific syntax

> 2 - reuse a URI if you believe that your use of that resource is
> expected to be consistent with the original intent. In the absence of
> expressive logics with negation, it will not be possible to
> computationally check if the meaning is consistent.
Correct. My proposal is to take the usage spec as defining, in  
natural language if necessary, the original intent.

> 3 - you might consider minting a URI that is identical in intent, but
> you like to track your contributions (provenance). In this case, you
> make statements to your URI, and should consider using owl:sameAs to
> indicate that the two resources should be considered equivalent.
It sounds like we have very different ways of dealing with provenance  
and trust, and it would probably be profitable to talk about these  
differences. As you aren't the first to mention this approach, it's  
probably important that we try to figure out the differences are in  
underlying assumptions, since otherwise we'll just talk past each other.

> Since a name isn't sufficient for understanding its meaning, we  
> suggest
> that you augment every RDF/OWL resource with:
> 1 - a concise human readable label using rdfs:label in the language of
> choice
yes. i am reluctant to impose even more requirements/advice, but  
label is probably warranted.
> 2 - a precise human readable definition using rdfs:comment in the
> language of choice.
yes, assuming there's any doubt that the logical definition doesn't  
capture all important information about usage (e.g. I think the  
pathology example doesn't need prose)
> 3 - RDF statements that you believe to be universally true about that
> resource
yes. I would have to define "universally": I think it means (from a  
formal viewpoint) that every theory that uses the name is asked to  
take the usage spec as axiomatic.
> 4 - or point to documents that make statements about that resource  
> using
> rdfs:isDefinedBy.
Fine, but if the statements are to be universally true why point to  
them? And what document-reference connective is strong enough to mean  
that the statements in the other document are to be axioms?  
owl:imports I guess.

I don't like isDefinedBy because it relates the thing to the defining  
document. But the thing just is; it doesn't need defining. It is the  
name that needs to be defined. E.g. you could have two names for the  
same thing, defined in different ways. Definition, IMO, is  
extralogical since it quantifies over *all* conforming theories.  
Definition is closer to a kind of modularity (axiom sets that get  
incorporated into many theories) and is therefore meta-meta.
>
> As an example, I built a prototype HTTP URI resolver for the entities
> defined in my most current OWL ontologies:
>
> http://134.117.55.46:8181/Protein ,
>
> where 134.117.55.46:8181 will eventually be ontology.dumontierlab.com
>
> In this way, a human can see the implied meaning, and an agent can
> follow other documents to determine what has been said about it (at
> least within my own knowledge base).

Tell me why you don't use the "eventual" term in the first place? If  
it's a resolution issue then something like resolution rules or a web  
proxy could help out here. If it's really a logical or provenance  
issue then I don't understand you (and I want to).

When I want to make provisional assertions so that they can be tried  
out (for consistency, linking, application behavior, etc. I isolate  
them from other assertions by setting up a separate theory (graph,  
scope, ...). Entailment while I play around comes only from the  
provisional graph, and entailment in some other, better established,  
maybe shared theory is not threatened by the statements I make in the  
flakier place. It sounds like you're doing the same kind of thing  
but  within a single graph/theory and modulating names to simulate  
multiple graphs (you would say I resort to multiple graphs to  
simulate renaming). I would like to hear how these approaches are  
substantially different (not just syntactically or organizationally).  
If choice between these two approaches bleeds through into a naming  
discussion then that's a threat to sharing and we need to talk about it.
Tell me I'm wrong...

> What remains lacking is a method by
> which we can discover what other people have said about this resource.
Excellent, I'd love to see a protocol for this. Currently I use google.
> That's why I'm fond of the http://lsrn.org (centralized) solution in
> which multiple data providers can register as a resolver a given base
> URI, so that people and agents can find out more about it (via HTML/ 
> RDF
> documents)[3].
> Moreover, it allows third party data providers to
> register a public identifier, and resolve it (in RDF documents)  
> prior to
> the authority having to do so! Analogously, in the LSID protocol
> (distributed), resolvers can register with the authority itself and
> provide different information.
Sounds a little like wikipedia. Where is the LSRN protocol  
documented? How do I add to the registry? Where is this LSID multi- 
resolver protocol documented? Where does the LSID resolver list come  
from, and how do I add to it?
>
> [3] http://lsrn.org/CAS:58-08-2
>
> Thus, I dislike anything that is "authoritative" or "monopolizing"  
> if it
> handicaps URI reuse and precludes the discovery of additional
> information about that resource.
Absolutely. But I don't see anything in the apparatus of OWL and/or  
the semantic web that precludes you from looking anywhere for any  
kind of information. E.g. I could keep track of a set of my favorite  
SPARQL endpoints to consult when I have questions.  I think what  
you're saying is that you like having a nonauthoritative but central  
point of contact, which to me sounds like a service analogous to  
google (uncurated), DMOZ (curated), or wikipedia (contribution- 
based). If this is so I need to learn more about LSRN.

The alternative to the "authority" of first published published  
dictionary definition (usage spec) is consensus process inducing what  
a term means. Although natural language is mostly defined by use, not  
prescription, I don't think this is viable for computational purposes  
(not necessarily limited to deduction) and it doesn't agree with my  
reading of how the scientific literature best develops. Yes, even  
when there are explicit definitions, there is overloading (others  
define the term differently), but good scholarship couples use of a  
technical term with a reference to the source that made the  
definition you're assuming. The URI in effect provides reference (to  
concept) and reference (to publication) in one module. The  
"authoritative" aspect here is just priority of publication, coupled  
with the uninteresting detail of collision avoidance through domain  
ownership. In a more recent version of the draft I've further  
downplayed the idea of "authority" - I now call it "minting  
authority" and explain that you lose control once you publish - your  
publication has to stand on its own unless a special contract is  
otherwise established that gives you some special ongoing  
relationship with the name. I hope we agree in this.

Should one expect a referral to a registry from this document? I  
would have said this is out of scope.
>
> Just my two cents,

Your cents matter to us.
Received on Sunday, 4 November 2007 16:59:15 UTC