RE: Semantic web - a fractal ongoing struggle toward greater consistency. from John Black on 2003-07-25 (www-rdf-interest@w3.org from July 2003)

From: John Black <JohnBlack@deltek.com>
Date: Fri, 25 Jul 2003 14:14:58 -0400
To: "Tim Berners-Lee" <timbl@w3.org>, "pat hayes" <phayes@ihmc.us>
Cc: <www-rdf-interest@w3.org>
Message-ID: <D3C8F903E7CC024C9DA6D900A60725D9025F33F8@DLTKVMX1.ads.deltek.com>
But I thought having two meanings for a URI was a good thing in RDF applied 
to the web.

Consider the following 3 individual markers:

 _003758FBD8D74BC9986772C568D4484C
 _0282C07E5EED4639B3CABAF87D3CAA91
 _139C77A9036C47948B1D1E88DD3BF0E3

and lets state them as a triple:

_003758FBD8D74BC9986772C568D4484C,
        _0282C07E5EED4639B3CABAF87D3CAA91,
		        _139C77A9036C47948B1D1E88DD3BF0E3
				
Now lets say that in a WWW oriented interpretation,
        these markers denote HTTP URIs:

let _003758FBD8D74BC9986772C568D4484C denote http://person.com/JohnBlack
let _0282C07E5EED4639B3CABAF87D3CAA91 denote http://properties.com/ssn
let _139C77A9036C47948B1D1E88DD3BF0E3 denote http://ssnvalues.com/123456789

In another IRS oriented interpretation,
        let these markers denote objects of the real world:

let _003758FBD8D74BC9986772C568D4484C denote John Black
let _0282C07E5EED4639B3CABAF87D3CAA91 denote US gov social security account #
let _139C77A9036C47948B1D1E88DD3BF0E3 denote 123456789

Now it seems to me that the expected power of applied RDF is just this 
metaphorical relationship.  The UUID has two separate denotations under two
different interpretations.  And it these different but parallel meanings that 
gives relevance to the web interpretation.  What we want is for all our 
statements about _003758FBD8D74BC9986772C568D4484C to somehow be true under 
both interpretations.

Of course, in RDF HTTP URIs are used as UUIDs.  But while they are used that 
way, they are no longer HTTP URIs, but just unique, opaque identity markers.  I still
think that in that case, we are using these HTTP URIs as UUIDs to denote the HTTP URIs
used as HTTP URIs.  I mean in a WWW oriented interpretation:

let "http://person.com/JohnBlack" denote http://person.com/JohnBlack
let "http://properties.com/ssn" denote http://properties.com/ssn
let "http://ssnvalues.com/123456789" denote http://ssnvalues.com/123456789

and in the IRS interpretation:

let "http://person.com/JohnBlack" denote John Black
let "http://properties.com/ssn" denote US gov social security account #
let "http://ssnvalues.com/123456789" denote 123456789

So even in this case, it seems that the triple with the UUID URI gets its relevance
in the WWW world, by being true in parallel with the interpretation at IRS, not because 
they mean the same thing.  The UUIDs denote two different things: a set of HTTP URIs 
in the web world and a Person, id #, and digit string value in the IRS world.

So having two meanings for a URI is a good thing in RDF applied to the web, 
isn't it?

-----Original Message-----
From: Tim Berners-Lee [mailto:timbl@w3.org]
Sent: Friday, July 25, 2003 11:11 AM
To: pat hayes
Cc: www-rdf-interest@w3.org
Subject: Semantic web - a fractal ongoing struggle toward greater consistency.





On Tuesday, Jul 22, 2003, at 12:31 US/Eastern, pat hayes wrote: 
in http://lists.w3.org/Archives/Public/www-tag/2003Jul/0293.html 
(in www-tag to which please do NOT crosspost) 
[...] 


My other, second, point - and this is where I was chuntering about human communication and lexical ambiguity of "bank" and so on - is less to do with the TAG and more of a design debate within the SWeb. It is relevant, however, as I suspect that your position on this matter is partly what makes you so anxious to insist on single interpretations. The question is, is our current design assumption - that all URIs always have the same meaning for all agents, so that RDF can be freely swapped around from ontology to ontology and processed by simple inference engines without any kind of checking for consonance of meaning - really realistic? You seem to think that it is, that the SWeb will be able to evolve into a global system of clear and unambiguous concepts, each assigned uniquely to a URI. 


Actually, I expect the web to have different order at different scales. A fractal system has similar amounts of organization showing up in a similar way at different scales. I think the semantic web will -- must in fact, to be useful -- evolve in this way. 
I have written about it in general in 
http://www.w3.org/DesignIssues/Fractal but here let me go over it from the semantic web point of view. 


There is a process we have been discussing in which two agents using a set of symbols exchange information, and in that way narrow down the set of interpretations they are dealing with such that the interpretations each agent uses are consistent with all the data exchanged. Every time a message sent by one is inconsistent with some interpretation the other had been considering the second agent throws the interpretation away. So we end up with a concept of "means the same thing". 


When more than two people do it, then we call it a community, or a movement, or whatever. 
Now, this process is lengthy and expensive. It is difficult to do and even more difficult to maintain. The larger the community which does it the longer it takes. Witness, the time it takes to get web architecture agreed. This is the work of W3C so we know, we feel the pain. Therefore, there will be relatively few symbols where everyone in the world (or every bit of software in the world) means the same thing. 
Candidates are rdf:type and dt:Integer. There will be many more symbols where that applies to a sub-community. Things like "atomic number", "EAN barcode number", and so on, you can imagine being shared by medium-sized communities. And some concepts, like skolem constants, are dreamed up locally in the course of a chat between two people, and never shared wider. There can be billions and billions of those as they are so cheap. 


So this is how it works. Each agent finds itself operating as a member of several communities of different sizes. It requires consistency within the data on which it operates. This means it has to do a finite a mount of work. It has to maintain a bunch of local symbols which are cheap, and just a few global symbols which are expensive but where its own share of the participation will be small. And some at several scales in between. The operations of this agent will help consistency between those groups. 


If every agent were in fact successful, then at close of business the world would indeed be a totally consistent huge system This is of course is not reality. There is no close of business. 
Just as new understandings are found, and inconsistencies are cleared up, so new people join communities, or people try to exchange more forms of data between existing communities, and the agenda for the work increases. There will never be global consistency. But in mature areas (think Online Financial Exchange, Calendars (ever?! ;-), weights and measures, etc) the terms will be well understood, and interoperability tests will have established that machines use them appropriately. 


What there will be lots of is the use of many different URIs to mean things which are either the same or very close. This is where different communities name the same concept independently. This causes no breakage: no inconsistencies. It does form an opportunity for more standardization work in the future. 


[...] 


So, ironically, the issue about agreeing to a common meaning, that you keep making an inappropriate fuss about, is actually a non-issue: our current design handles it perfectly, and you don't even need to mention "resources"; but I am saying that the current design is in fact broken, for just this reason. It is predicated on a falsehood that you wish to be made into a principle. You don't need to make it into a principle; and making it into a principle isn't going to make it any truer than it is, or make the knowledge integration problem go away, in any case, as practical SW users will rapidly discover; in some cases, are already discovering, eg see http://smi-web.stanford.edu/si2003/ 



I think we are talking about different levels of "principle". 
I am not saying its not going to happen. 
I am saying don't do it deliberately. 


The specs are protocols. A protocol design says "if one acts the following way, then the following things become true". They say what people should and should not do, and what certain things mean. They say that rdf:type is a property associated with the binary relation of class membership. What do you take that as? A fact which is true because the owner of the URI said so? a fundamental Principle? a declaration? An assertion others are free to question? Part of a convention? 


In the spec it i written up as though a fact. If everyone reads and follows the spec, they can communicate using rdf:type. 


What do you call it when someone says {rdf:type rdf:type animals:Cat .} ? 
(where Cat means what we expect it to mean, the animal). 
A statement which may be true in certain interpretations? 
An alternative and quite reasonable view of life? 
From the RDF spec's point of view, it is what spec writers would call an "error". 


So I'm not proposing to declare that people never make errors. 
But I do want to make sure that things which will stop the protocol working are called errors so that people are encouraged not to do them. 


There are people who would like to say 


<http://www.w3.org/> a soc:Consortium. 
<http://www.w3.org/> a doc:Work. 


because they say it should be clear from the context what one means. 
So, while i'm not saying declaring that such documents don't exist, 
I am saying that we should say that it is an error to use same URI to mean two different things. 


Then I would like the spec to assert that HTTP URIs (without hashes) denote information resources. This allows anyone who sees the URI of one of those things to use that URI 


<http://www.example.w3.org/> rdf:type xx:currentlyInaccessable. 


<http://www.example.com/> xx:textContents """You have reached this web page by typing "example.com", "example.net", or "example.org" into your web browser. 
These domain names are reserved for use in documentation and are not available for registration. See RFC 2606 , Section 3. """. 


without having to find some documentation as to whether in fact that URI was designed by the domain name owner to be the identifier of some dog. 


I have had a huge push-back on this from various people for various quite different reasons, 
http://www.w3.org/DesignIssues/HTTP-URI.html being a summary of some of the arguments. 


It follows that anything with "http:" and no "#" is a information resource, and so that means it would be an error to use that URI for a rdf Property. 
It is good to tell people that sort of thing, so that they don't just follow the spec and then come up with an inconsistency later on through no fault of their own. 


So I don't know whether that is some wording different from "error" for this sort of circumstance. 


Pat 
-- 
--------------------------------------------------------------------- 
IHMC    (850)434 8903 or (650)494 3973 home 
40 South Alcaniz St.    (850)202 4416 office 
Pensacola                       (850)202 4440 fax 
FL 32501                        (850)291 0667 cell 
phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Friday, 25 July 2003 14:20:13 UTC