- From: Williams, Stuart <skw@hp.com>
- Date: Fri, 2 Apr 2004 13:50:29 +0100
- To: Tim Berners-Lee <timbl@w3.org>, Pat Hayes <phayes@ihmc.us>
- Cc: public-webarch-comments@w3.org
- Message-ID: <E864E95CB35C1C46B72FEA0626A2E808028A272D@0-mail-br1.hpl.hp.com>
Tim, Pat,... others on public-webarch-comments,
If this thread heads off into substantive technical discussion, rather than
clarification of what it is that is at issue, I'd like to encourage it to
move over onto www-tag. I realise that that also risks something of a
melt-down if a lot of folks join in (again)... but TAG charter does
stipulate that TAG conduct its work in public.
I had some techy things I wanted to say in response a couple of points that
Tim makes... but I feel constrainted about enlarging the technical
discussion here rather than www-tag.
OTOH this is at least a publically visible archive... interested parties can
at least go hunting.
Cheers
Stuart
--
_____
From: public-webarch-comments-request@w3.org
[mailto:public-webarch-comments-request@w3.org] On Behalf Of Tim Berners-Lee
Sent: 1 April 2004 23:33
To: Pat Hayes
Cc: public-webarch-comments@w3.org
Subject: Re: comments on Web Architecture First Edition
In
http://lists.w3.org/Archives/Public/public-webarch-comments/2004JanMar/1057.
html
Pat, you wrote, in commenting on
http://www.w3.org/TR/2003/WD-webarch-20031209/
a lot of text to which I will endeavour to respond. I think the gist of your
comments are carried in the first few pages, which I will respond to inline.
In brief, you have a very good point. Yes, the document uses terminology in
a confusing way. We had as a group shied away from trying to clear it up as
it had been connected with issue httpRange-14 which had been postponed until
after version 1.0. I think your comment makes it clear how important it is
for that issue to be resolved.
I willl give my take on the resolution of it below. Please excuse typos.
1. General comment about vocabulary
The vocabulary used throughout this document can be understood in two
rather different ways, which conflict with one another. Exactly what
is being said is therefore not always clear, and in some cases may be
understood by some readers to have different meanings than those
intended. It would be helpful if the terminology used could be, if
not defined, at least have its intended meanings clarified somewhat.
Yes. An ontology may help. I have sort of run one off on the fly below.
Basically, we are using this term "Resource" for two concepts, which you
identify below in (C) and (D).
I realize that this kind of request conflicts with the requirements
of ease of reading and general literary style, but it is nevertheless
important; it could be done with a glossary, for example. However, in
order to be useful, a glossary should not merely repeat sentences
from the text using the same terminology. Thus
"Resource: An item of interest in the information space known as the
World Wide Web" is completely uninformative since the definition
repeats the words used in the text and hence does not resolve their
ambiguity or provide any other way to grasp their intended meaning.
The specific ambiguity revolves around a group of terms (semantic,
represent, identify, refer, about, meaning, resource) which can be
understood in two rather different ways, which I will refer to as (C)
and (D).
(C) as in a programming language, where an identifier serves to
uniquely locate (relative to the current computational state of a
virtual machine or a network) some piece of data. Approximate
synonyms for 'identifier' in this sense include 'link', 'address' and
'pointer'; and ideas like hash coding and database key are also
connected with this sense of 'identify'.
The corresponding usage of 'representation' is where one speaks of a
representation of data, or of the state of a computational entity.
The corresponding usage of 'resource' is something that is, or can in
principle be, identified in this sense: a computational entity (or
the state of it) which is accessible via some network link or
transfer protocol.
The corresponding usage of 'semantic' language is closely analogous
to the way this terminology is typically used in describing the
semantics of computational systems.
I think in section C your concept of "resource" is something like a program
or a hypertext file. I will call these Information Resources. These are
things which can have representations.
representation
rdfs:domain InformationResource;
rdfs:range Representation.
(D) as in a descriptive language, such as English or formal logical
languages, where "identifier" is synonymous with 'name', and to
identify means simply to refer to, name or denote.
The corresponding sense of 'representation' is 'description' (or
possibly 'formal description'), in the sense used in KR work, AI and
formal linguistics.
The corresponding sense of 'resource' would be simply 'entity' or
'thing', ie the word used in this way has no special Web or
Internet-related meaning and is simply a synonym for 'entity' in the
philosophical sense: anything that can be referred to, ie anything.
The corresponding usage of the 'semantic' language is more analogous
to the way that this terminology is used in linguistics, philosophy,
logical semantics and AI/KR work.
This is the concept of "Resource" as "Thing", Top, "Whatever", etc.
In general, URIs can identify anything, so there are no range constraints to
"identifies". One can use owl:Thing for this class.
Some URIs, however identify InformationResources.
{ ?u a InformationResourceIdentifier.
identifies ?x } => { ?x a InformationResource }.
For example, HTTP makes a web -- "information space" --- of information
resources.
{ ?u string:matches "http:[^#]" } => { ?u a InformationResourceIdentifier }.
There is a lot of architecture which is specifically about
InformationResources,
but as the distinction isn't made in the document it leaves one confused.
Specifically, the hash is a major building block which is defined
for InformationResources. If you take the identifier of an
informationresource
and add a hash and a local identifier of something mentioned within that
information resource, you forge a global identifier for whatever the local
identifier
was for.
{ (?u "#" ?v) string:concatenation ?w } <=>
{ ?w irid ?u; localid ?v. }.
irid rdfs:domainApplicationIdentifier;
rdfs:range InfromationResourceIdentifier.
ApplicationIdentifier rdfs:isSubclassOf URI.
# The first WWW application was global hypertext.
# In that application, the applicatoin identifiers identify Anchors.
# (start/end of link in the dexter model of hypertext)
{ ?w identifies ?x; irid ?u; frag ?v.
?u identifies ?y.
?y representation [ internetContentType "text/html"; data ?d ].
} => {
?x a ht:Anchor; ht:anchorId ?v; ht:pageSource ?d.
}
{ ?w identifies ?x; irid ?u.
?u identifies ?y.
?y representation [ internetContentType "application/rdf+xml" ].
} => {
# As RDF documents are general and the symbols identify
# arbitrary things, nothing to be deduced directly about the type of ?x
}
# However one has local rules of web browsing such as
{ ?w identifies ?x; irid ?u.
?u identifies ?y.
?y representation [ internetContentType "application/rdf+xml" ; data ?d];
a ReliableSource.
?d log:parsedAsRDFXML [ log:includes { ?s ?p ?o }]
} => {
?s ?p ?o.
}
The fact that there is a social input here ("ReliableSource") on the
inference is what stops
the whole semantic web blowing up and having to be consistent of course.
This is the "web" bit, if you like.
Although these two readings are obviously closely related, and in
some circumstances can be conflated, for example when discussing the
formal semantics of a programming language, they are not the same. It
is important to keep them distinct, especially when discussing
referring formalisms (such as RDF and OWL) based on (D) ideas but
deployed on a computationally defined network normally described
using (C) terminology, it is necessary to carefully distinguish them.
The architecture of the semantic web is in a way that the RDF and OWl use in
sense D is coupled with the existing processing ability of sense D in the
above architecture.
This is the architecture of the [semantic] web.
(cf: "It is important not to confuse the "refrigerator" as a cooling device
with the "refrigerator" which you seem to treat as a container. The concepts
of cooling and containment are separate." Well, no, the point is the fridge
is a cooler which contains things, and that is what make it useful)
In particular, in sense (C), but not in sense (D), there is a
presumption of a computable or effective process which can be applied
to the identifier to provide access to the entity identified; an
assumption (which follows from the previous) that the identification
must be unique; and an understanding that this process might depend
on the state of some computational system. None of these is
assumptions is generally plausible for sense (D).
You see there are three layers here. A URI in general is a wide class which
doesn't tell you anything, and so in all generality can "identify" anything.
There is nothjing in general oine can do processingwise.
The InformationResourceIdentifier is a subclass of URI with a restriction on
the values of "identifies" toInformationResources and which computational
things can be done.
The ApplicationIdentifier is a (distinct) subclass of URI, which is a
compount of an informatioResourceidentifier and a local identifier. This has
NO restriction in general on what it can identify, although different
applications -- identified by content type if and when a retrieval is done
-- make restrictions.
On the other hand, formal analyses of sense (D) generally understand
reference to be relative to an interpretation, and discuss meaning in
terms of constraints on, and relationships between, interpretations.
This style of analysis, and the terminology associated with it, has
been a standard in formal semantics - logical semantics, formal
linguistics and formal philosophy - for over half a century. The
notion of interpretation involved has no particular connection with
the sense of computational state underlying sense (C). Even when
uniqueness of reference is required within an interpretation,
guaranteeing uniqueness across all possible interpretations is
usually meaningless or provably impossible.
This is a separate discussion we have had before, If I have time I'll write
it up.
The document often seems to slip between these two senses, in ways
that suggest inappropriate conclusions. Several of the principles
stated seem appropriate for sense (C) but are inappropriate, and in
some cases positively harmful, if understood in sense (D). (Details
below.) I would therefore ask that the authors clarify their intended
meaning before publication.
My personal belief if that the distinction between "Resource" in the sense
of "Whatever" and "resource" in the sense of "Information thing in the web
of information which may be linked to other things" needs to be made
clearly. I tried to get this change into the document some time ago, and
failed, as we ratholed. My apologies. We have formally decided to leave that
issue for a future version.
(Meta-comment: In making similar comments on similar documents in
the past, I have found that any attempt to ask for clarification on
this point is met with resistance on the grounds that the intended
meaning is obvious, and have been advised to consult an English
dictionary.
Note the case here.
Leaving aside the potentially insulting nature of such a
response, the key point is that the terminology used here is being
used in technical senses, rather than the informal English senses;
and moreover, much of this terminology already has technical senses
which are well-established in disciplines which some readers of this
document work within, and which are relevant to emerging Web
technology. If words are being used here in ways which conflict with
these established technical usages, therefore, it is important to
make at least these aspects of the intent clear. For example, the
semi-technical use of the term "resource" is unknown in the English
language generally and even, as far as I am aware, in the general
technical computer-science literature. It seems to be a usage special
to the internet community.)
---------
2. Hunting down what is meant by "resource".
The rest of your comment (of which I have read the next 5 pages or so in
10pt) is a discussion of the occurrences of and confusion between uses of
the term "resource".
I have red scribbled notes on it but I think the level of detail is much
lower than the preceding, so if necessary I will put it in a different post.
Tim BL
Received on Friday, 2 April 2004 12:08:32 UTC