Re: What if an URI also is a URL from John Black on 2007-06-12 (semantic-web@w3.org from June 2007)

From: John Black <JohnBlack@kashori.com>
Date: Tue, 12 Jun 2007 16:53:05 -0400
To: "Tim Berners-Lee" <timbl@w3.org>
Cc: "M.David Peterson" <m.david@xmlhacker.com>, "r.j.koppes" <rikkert@rikkertkoppes.com>, "Yuzhong Qu" <yzqu@seu.edu.cn>, "Sandro Hawke" <sandro@w3.org>, <semantic-web@w3.org>, <swick@w3.org>, <phayes@ihmc.us>
Message-ID: <1a6701c7ad33$add4a4c0$6601a8c0@KASHORI001>
Tim Berners-Lee wrote
> On 2007-06 -11, at 13:53, John Black wrote:
>> Tim Berners-Lee wrote
>>> On 2007-06 -09, at 21:22, M. David Peterson wrote:
>>>> On Sat, 09 Jun 2007 07:13:52 -0600, Tim Berners-Lee <timbl@w3.org>
>>>> wrote:
>>>>
>>>>> No. It cannot identify both a document and a person.
>>>>
>>>> Tim: Will all due respect... WTF?
>>>
>>>
>>> I am using the 'identify' in the strict sense of 'denote'.
>>> The semantic web is like a logic language in which URIs are symbols.
>>
>> Do you believe that by claiming to use the strict, logical sense of  the
>> word 'denote' you thereby cause or require such denotations to  be
>> absolute and unambiguous? Where do think denotations (or
>> identifications) come from?
>
> The architecture is that each URI is owned.   With HTTP URIs, this
> happens through the domain name system and often  delegation within a
> domain. Unlike a word, a URI has an owner.  The owner attempts to  make
> enough information available that the URI can be used by others  without
> ambiguity in practical situation.
>

But what about dbpedia.org? Who owns those URI? And that is one of the most 
exciting sites around. If "owned" at all, it is owned by a community that 
cooperatively decide the denotations of the URI.

> For example, W3C owns http://www.w3.org/People/Berners-Lee/card#i and  has
> delegated to me the right to say what that URI stands for. To use  it for
> something else is an error.
>
>> In my opinion to denote (or to identify) is a verb, something that  is
>> done by the users of a symbol. After all, symbols (URI) are not  agents,
>> they don't wake up and choose to denote this or that.
>
> They have wonders which create them for a specific purpose.
>
>> Nor do I think denotation is an attribute or property of a symbol,
>> somehow built in or attached when the symbol is first conceived. It  is
>> more like a dance. I use a symbol to denote something expecting  you to
>> interpret it to denote the same thing. And this  coordination, this
>> synchrony of interpretation by both sender and  receiver, is not always
>> easy. It requires real effort to sustain  it. The minter of a URI cannot
>> make it happen by declaration, nor  can a research group or a standards
>> body just decree it so.
>
> In many cases, the URI is defined by connection to already well- defined
> sets of things.  In other cases, such as the terms in the OWL  ontology,
> there was a huge amount of effort and discussion involved,  and the
> current term is supported by a lot of ongoing tutorials and  so on.  No
> one said it was easy.  But it is a different architecture  from dance
> associated with natural language words.
>
> It is different by design.  The semantic web is an engineered system,  not
> an observation of nature.
>
>> The reason this matters is that since it requires this effort to  create
>> a denotation/identification in the first place, it is far  more sensible,
>> to me at least, to expect that the final  disambiguation of a symbol be
>> accomplished in the same way, by  coordinated effort of the parties using
>> the symbol, not by  declaration of the W3C specifications that all URIs
>> be absolutely  unambiguous.
>> This seems to me to be, as my grandfather used to say, a vain task.
>
> Your grandfather would perhaps have suggested that an attempt to  define
> the meaning of common words, as the Académie Française is set  up to do
> were a 'vain task'.   Many would agree.  But given that his  water came to
> him though pipes connected, possibly, by half-inch  British Standard
> pipe-thread connections, and he rode on rails set a  certain distance
> apart by some committee, and his TV came for better  or worse in 525 or
> 625 lines as decided by other committees, he may  have respected that the
> creation of standards is a very valuable  function, and an essential to
> progress.
>
> When people meet to define W3C specifications they are not doing it  out
> of vanity.   They are performing coordinated effort of the  parties who
> would like to be able to use the symbol.  They are, in  general, users and
> representatives of users of the symbol.   They  come together to allow
> those who follow them to use it. They often  work long hours, receiving
> inadequate recognition for either products  shipped or papers published,
> the conventional metrics of performance,  so I would not call it vanity.
>
> Note also that W3C (IETF, etc) specs have achieved a lot, made a lot  of
> interoperable systems, and formed with each layer a foundation for
> building new layers. So I would not say that the work as been in vain
> either.
>

Of course. I agree these organizations have achieved a lot, the people in 
them have worked hard, for little compensation, and specifications are good 
and necessary. What I meant to express was my belief that it is virtually 
impossible for anyone to prevent ambiguity in the URI created by the wide 
open public once these specifications have been released. And furthermore, 
that the real problem is that of creating a denotation for a URI at all. 
Once that problem is sovled, I think such issues as http-range-14, 303 
redirects, etc. unnecessary. As such, I think they are a distraction.

Here are a few examples of the kind of ambiguity that I am referring to. I 
assume for this purpose that it is in fact possible to create a denotation 
of a URI by publishing information about it at the location that will be 
returned through HTTP.

A scenario: I make a statement at one point in time using your URI while the 
"information" about the URI says one thing. For simplicity, lets say I copy 
and cache this denoting/identifying information with my statements using it. 
Later you change the information at that URI and I make another statement 
using the same URI. If the denotation of a URI is given by the information 
retrievable from that URI, then it has become ambiguous. It denotes one 
thing as used in statements before the change, and another thing in 
statements used after the change. This would be ambiguity due to 
denotation-drift over time.

Another scenario: A scientist discovers a new molecule. He names it with a 
URI that he owns. It is denoted by a descriptive account of some of the 
measurable properties that the scientist has discovered. Now another 
scientist becomes interested it in and discovers what he believes are 
additional properties. He wants to publish his results about the molecule. 
But the first scientist owns the URI, which has now become commonly used in 
the field, and doesn't want to add the new properties. Perhaps he thinks he 
has discovered properties which are incompatible and publishes those. So the 
second scientist publishes additional data about the molecule on his own web 
site using the original URI that everyone is using. Factions develop around 
each differing denotation. It is now ambiguous because the factions cannot 
agree on its denotation.

Another scenario: If the denotation of a URI is given by the information 
that can be referenced when it is used in HTTP, then the URIs found there 
must be further denoted by the statements that can be referenced by those 
URI and so on. Now this process would likely be very computationally 
expensive in many cases and would likely be cut short in practical cases. If 
clients differed in the depth they computed the closure of a URI it would 
create ambiguity. One client has a reasoner that computes the ontological 
closure of a URI to level 100 and another client has one that computes the 
closure to level 200. Ambiguity arises here because the 
denotation/identification of one client's view of the URI is more detailed 
than the other.

Another scenario: If a URI has embedded in it one or more natural language 
words, then the denotations of those words may affect the overall denotation 
of the URI. In the worse case, those natural language words may be ambiguous 
themselves. To prevent this you could forbid the use of natural language 
words in URI, but this would remove much of the actual semantics that exists 
on the semantic web. For as far as I can tell, we are so far mostly riding 
on the back of natural language to supply the denotations of URI. This is 
ambiguity due to the symbolic theft of natural language terms used in URI. 
See my post, http://kashori.com/2006/11/semantic-web-and-symbolic-theft.html 
for more on this idea.




> Tim
>
>
>
>
>
Received on Tuesday, 12 June 2007 20:53:49 UTC