Re: Semantic web - a fractal ongoing struggle toward greater consistency. from Sherman Monroe on 2003-08-04 (www-rdf-interest@w3.org from August 2003)

From: Sherman Monroe <shermanmonroe@yahoo.com>
Date: Mon, 4 Aug 2003 03:42:34 -0700 (PDT)
To: Tim Berners-Lee <timbl@w3.org>, pat hayes <phayes@ihmc.us>
Cc: www-rdf-interest@w3.org
Message-ID: <20030804104234.89773.qmail@web14711.mail.yahoo.com>
It has always been helpful for me to think of the S-Web as a space separate from the WWW. The WWW is simply a document space. The S-Web, on the other hand, is a virtual space, even more so than the WWW is, because the URI's reference resources that can not be retrieved from the S-Web space (which I'll henceforth refer to as the semantic space). So in a strict sense, the semantic space has no content, it's void - aside from the URI's themselves. The meaning of those URI's are stored in the procedural code and RDF data stores of the agents that use them, and in documentation (both human and machine consumable) located in the WWW document space (which is, again, separate from the semantic space), and in the brains of the people that create them. From a not-so-strict interpretation, the agents themselves are apart of the semantic space, and thus, it has content, but that content resides in semweb agents, not documents.
 
The question is, what use is there in using the http protocol (or even the "http://" prefix) in the semantic space? Some argue that a URI must have a representation (in the WWW space), and suggest the URI could be used to "lookup" the URI in the WWW space. While I would agree that there should exist some mechanism for retrieving a representation of a URI, and the models in which is it involved, I'd advise against using the WWW space as an approach to this, simply to avoid cross-contamination of these two separate spaces.
 
The best approach to me would be to let the content of the semantic space (which among other things would include the meaning of a URI, and the models which involve it) exist in and be access via semweb agents and inference engines. So instead of submitting a URI to the WWW space to retrieve the meaning of a URI, one would submit the URI to an agent whose sole purpose is to index and reason about URI's in the semantic space. The agent would then return the appropriate RDF models, or generate an HTML document, or whatever documents the URI.
 
I'm for creating a new protocol explicitly for the semantic space. When I look into a source file, I would like to know that a URI beginning with "http://" references a web doc, and one beginning with "rdfp://" (or something) references a URI in the semantic space.
 
-sherman


Tim Berners-Lee <timbl@w3.org> wrote:

On Tuesday, Jul 22, 2003, at 12:31 US/Eastern, pat hayes wrote:
in http://lists.w3.org/Archives/Public/www-tag/2003Jul/0293.html
(in www-tag to which please do NOT crosspost)
> [...]

> My other, second, point - and this is where I was chuntering about 
> human communication and lexical ambiguity of "bank" and so on - is 
> less to do with the TAG and more of a design debate within the SWeb. 
> It is relevant, however, as I suspect that your position on this 
> matter is partly what makes you so anxious to insist on single 
> interpretations. The question is, is our current design assumption - 
> that all URIs always have the same meaning for all agents, so that RDF 
> can be freely swapped around from ontology to ontology and processed 
> by simple inference engines without any kind of checking for 
> consonance of meaning - really realistic? You seem to think that it 
> is, that the SWeb will be able to evolve into a global system of clear 
> and unambiguous concepts, each assigned uniquely to a URI.

Actually, I expect the web to have different order at different scales. 
A fractal system has similar amounts of organization showing up in a 
similar way at different scales. I think the semantic web will -- must 
in fact, to be useful -- evolve in this way.
I have written about it in general in
http://www.w3.org/DesignIssues/Fractal but here let me go over it from 
the semantic web point of view.

There is a process we have been discussing in which two agents using a 
set of symbols exchange information, and in that way narrow down the 
set of interpretations they are dealing with such that the 
interpretations each agent uses are consistent with all the data 
exchanged. Every time a message sent by one is inconsistent with some 
interpretation the other had been considering the second agent throws 
the interpretation away. So we end up with a concept of "means the 
same thing".

When more than two people do it, then we call it a community, or a 
movement, or whatever.
Now, this process is lengthy and expensive. It is difficult to do and 
even more difficult to maintain. The larger the community which does it 
the longer it takes. Witness, the time it takes to get web architecture 
agreed. This is the work of W3C so we know, we feel the pain. 
Therefore, there will be relatively few symbols where everyone in the 
world (or every bit of software in the world) means the same thing.
Candidates are rdf:type and dt:Integer. There will be many more 
symbols where that applies to a sub-community. Things like "atomic 
number", "EAN barcode number", and so on, you can imagine being shared 
by medium-sized communities. And some concepts, like skolem constants, 
are dreamed up locally in the course of a chat between two people, and 
never shared wider. There can be billions and billions of those as 
they are so cheap.

So this is how it works. Each agent finds itself operating as a member 
of several communities of different sizes. It requires consistency 
within the data on which it operates. This means it has to do a finite 
a mount of work. It has to maintain a bunch of local symbols which are 
cheap, and just a few global symbols which are expensive but where its 
own share of the participation will be small. And some at several 
scales in between. The operations of this agent will help consistency 
between those groups.

If every agent were in fact successful, then at close of business the 
world would indeed be a totally consistent huge system This is of 
course is not reality. There is no close of business.
Just as new understandings are found, and inconsistencies are cleared 
up, so new people join communities, or people try to exchange more 
forms of data between existing communities, and the agenda for the work 
increases. There will never be global consistency. But in mature areas 
(think Online Financial Exchange, Calendars (ever?! ;-), weights and 
measures, etc) the terms will be well understood, and interoperability 
tests will have established that machines use them appropriately.

What there will be lots of is the use of many different URIs to mean 
things which are either the same or very close. This is where 
different communities name the same concept independently. This causes 
no breakage: no inconsistencies. It does form an opportunity for more 
standardization work in the future.

> [...]

> So, ironically, the issue about agreeing to a common meaning, that you 
> keep making an inappropriate fuss about, is actually a non-issue: our 
> current design handles it perfectly, and you don't even need to 
> mention "resources"; but I am saying that the current design is in 
> fact broken, for just this reason. It is predicated on a falsehood 
> that you wish to be made into a principle. You don't need to make it 
> into a principle; and making it into a principle isn't going to make 
> it any truer than it is, or make the knowledge integration problem go 
> away, in any case, as practical SW users will rapidly discover; in 
> some cases, are already discovering, eg see 
> http://smi-web.stanford.edu/si2003/
>

I think we are talking about different levels of "principle".
I am not saying its not going to happen.
I am saying don't do it deliberately.

The specs are protocols. A protocol design says "if one acts the 
following way, then the following things become true". They say what 
people should and should not do, and what certain things mean. They 
say that rdf:type is a property associated with the binary relation of 
class membership. What do you take that as? A fact which is true 
because the owner of the URI said so? a fundamental Principle? a 
declaration? An assertion others are free to question? Part of a 
convention?

In the spec it i written up as though a fact. If everyone reads and 
follows the spec, they can communicate using rdf:type.

What do you call it when someone says {rdf:type rdf:type animals:Cat 
.} ?
(where Cat means what we expect it to mean, the animal).
A statement which may be true in certain interpretations?
An alternative and quite reasonable view of life?
From the RDF spec's point of view, it is what spec writers would call 
an "error".

So I'm not proposing to declare that people never make errors.
But I do want to make sure that things which will stop the protocol 
working are called errors so that people are encouraged not to do them.

There are people who would like to say

a soc:Consortium.
a doc:Work.

because they say it should be clear from the context what one means.
So, while i'm not saying declaring that such documents don't exist,
I am saying that we should say that it is an error to use same URI to 
mean two different things.

Then I would like the spec to assert that HTTP URIs (without hashes) 
denote information resources. This allows anyone who sees the URI of 
one of those things to use that URI

rdf:type xx:currentlyInaccessable.

xx:textContents """You have reached this web 
page by typing "example.com", "example.net", or "example.org" into 
your web browser.
These domain names are reserved for use in documentation and are not 
available for registration. See RFC 2606 , Section 3. """.

without having to find some documentation as to whether in fact that 
URI was designed by the domain name owner to be the identifier of some 
dog.

I have had a huge push-back on this from various people for various 
quite different reasons,
http://www.w3.org/DesignIssues/HTTP-URI.html being a summary of some of 
the arguments.

It follows that anything with "http:" and no "#" is a information 
resource, and so that means it would be an error to use that URI for a 
rdf Property.
It is good to tell people that sort of thing, so that they don't just 
follow the spec and then come up with an inconsistency later on through 
no fault of their own.

So I don't know whether that is some wording different from "error" 
for this sort of circumstance.

> Pat
> -- 
> ---------------------------------------------------------------------
> IHMC (850)434 8903 or (650)494 3973 home
> 40 South Alcaniz St. (850)202 4416 office
> Pensacola (850)202 4440 fax
> FL 32501 (850)291 0667 cell
> phayes@ihmc.us http://www.ihmc.us/users/phayes
>



---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
Received on Monday, 4 August 2003 06:42:35 UTC