RE: comments on Web Architecture First Edition from Williams, Stuart on 2004-04-02 (public-webarch-comments@w3.org from April 2004)

From: Williams, Stuart <skw@hp.com>
Date: Fri, 2 Apr 2004 13:50:29 +0100
To: Tim Berners-Lee <timbl@w3.org>, Pat Hayes <phayes@ihmc.us>
Cc: public-webarch-comments@w3.org
Message-ID: <E864E95CB35C1C46B72FEA0626A2E808028A272D@0-mail-br1.hpl.hp.com>
Tim, Pat,... others on public-webarch-comments,
 
If this thread heads off into substantive technical discussion, rather than
clarification of what it is that is at issue, I'd like to encourage it to
move over onto www-tag. I realise that that also risks something of a
melt-down if a lot of folks join in (again)... but TAG charter does
stipulate that TAG conduct its work in public.
 
I had some techy things I wanted to say in response a couple of points that
Tim makes... but I feel constrainted about enlarging the technical
discussion here rather than www-tag.
 
OTOH this is at least a publically visible archive... interested parties can
at least go hunting.
 
Cheers
 
Stuart
--
 


  _____  

From: public-webarch-comments-request@w3.org
[mailto:public-webarch-comments-request@w3.org] On Behalf Of Tim Berners-Lee
Sent: 1 April 2004 23:33
To: Pat Hayes
Cc: public-webarch-comments@w3.org
Subject: Re: comments on Web Architecture First Edition



In
http://lists.w3.org/Archives/Public/public-webarch-comments/2004JanMar/1057.
html 

Pat, you wrote, in commenting on 


http://www.w3.org/TR/2003/WD-webarch-20031209/ 



a lot of text to which I will endeavour to respond. I think the gist of your
comments are carried in the first few pages, which I will respond to inline.
In brief, you have a very good point. Yes, the document uses terminology in
a confusing way. We had as a group shied away from trying to clear it up as
it had been connected with issue httpRange-14 which had been postponed until
after version 1.0. I think your comment makes it clear how important it is
for that issue to be resolved. 


I willl give my take on the resolution of it below. Please excuse typos. 


1. General comment about vocabulary 


The vocabulary used throughout this document can be understood in two 

rather different ways, which conflict with one another. Exactly what 

is being said is therefore not always clear, and in some cases may be 

understood by some readers to have different meanings than those 

intended. It would be helpful if the terminology used could be, if 

not defined, at least have its intended meanings clarified somewhat. 


Yes. An ontology may help. I have sort of run one off on the fly below.
Basically, we are using this term "Resource" for two concepts, which you
identify below in (C) and (D). 


I realize that this kind of request conflicts with the requirements 

of ease of reading and general literary style, but it is nevertheless 

important; it could be done with a glossary, for example. However, in 

order to be useful, a glossary should not merely repeat sentences 

from the text using the same terminology. Thus 

"Resource: An item of interest in the information space known as the 

World Wide Web" is completely uninformative since the definition 

repeats the words used in the text and hence does not resolve their 

ambiguity or provide any other way to grasp their intended meaning. 


The specific ambiguity revolves around a group of terms (semantic, 

represent, identify, refer, about, meaning, resource) which can be 

understood in two rather different ways, which I will refer to as (C) 

and (D). 


(C) as in a programming language, where an identifier serves to 

uniquely locate (relative to the current computational state of a 

virtual machine or a network) some piece of data. Approximate 

synonyms for 'identifier' in this sense include 'link', 'address' and 

'pointer'; and ideas like hash coding and database key are also 

connected with this sense of 'identify'. 

The corresponding usage of 'representation' is where one speaks of a 

representation of data, or of the state of a computational entity. 

The corresponding usage of 'resource' is something that is, or can in 

principle be, identified in this sense: a computational entity (or 

the state of it) which is accessible via some network link or 

transfer protocol. 

The corresponding usage of 'semantic' language is closely analogous 

to the way this terminology is typically used in describing the 

semantics of computational systems. 


I think in section C your concept of "resource" is something like a program
or a hypertext file. I will call these Information Resources. These are
things which can have representations. 


representation 

        rdfs:domain InformationResource; 

        rdfs:range Representation. 



(D) as in a descriptive language, such as English or formal logical 

languages, where "identifier" is synonymous with 'name', and to 

identify means simply to refer to, name or denote. 

The corresponding sense of 'representation' is 'description' (or 

possibly 'formal description'), in the sense used in KR work, AI and 

formal linguistics. 

The corresponding sense of 'resource' would be simply 'entity' or 

'thing', ie the word used in this way has no special Web or 

Internet-related meaning and is simply a synonym for 'entity' in the 

philosophical sense: anything that can be referred to, ie anything. 

The corresponding usage of the 'semantic' language is more analogous 

to the way that this terminology is used in linguistics, philosophy, 

logical semantics and AI/KR work. 


This is the concept of "Resource" as "Thing", Top, "Whatever", etc. 

In general, URIs can identify anything, so there are no range constraints to


"identifies". One can use owl:Thing for this class. 


Some URIs, however identify InformationResources. 


{ ?u a InformationResourceIdentifier. 

        identifies ?x } => { ?x a InformationResource }. 


For example, HTTP makes a web -- "information space" --- of information
resources. 


{ ?u string:matches "http:[^#]" } => { ?u a InformationResourceIdentifier }.




There is a lot of architecture which is specifically about
InformationResources, 

but as the distinction isn't made in the document it leaves one confused. 


Specifically, the hash is a major building block which is defined 

for InformationResources. If you take the identifier of an
informationresource 

and add a hash and a local identifier of something mentioned within that 

information resource, you forge a global identifier for whatever the local
identifier 

was for. 


{ (?u "#" ?v) string:concatenation ?w } <=> 

{ ?w irid ?u; localid ?v. }. 


irid rdfs:domainApplicationIdentifier; 

        rdfs:range InfromationResourceIdentifier. 


ApplicationIdentifier rdfs:isSubclassOf URI. 



# The first WWW application was global hypertext. 

# In that application, the applicatoin identifiers identify Anchors. 

# (start/end of link in the dexter model of hypertext) 


{ ?w identifies ?x; irid ?u; frag ?v. 

?u identifies ?y. 

?y representation [ internetContentType "text/html"; data ?d ]. 

} => { 

        ?x a ht:Anchor; ht:anchorId ?v; ht:pageSource ?d. 

} 



{ ?w identifies ?x; irid ?u. 

?u identifies ?y. 

?y representation [ internetContentType "application/rdf+xml" ]. 

} => { 

        # As RDF documents are general and the symbols identify 

# arbitrary things, nothing to be deduced directly about the type of ?x 

} 


# However one has local rules of web browsing such as 


{ ?w identifies ?x; irid ?u. 

?u identifies ?y. 

?y representation [ internetContentType "application/rdf+xml" ; data ?d]; 

      a ReliableSource. 

?d log:parsedAsRDFXML [ log:includes { ?s ?p ?o }] 

} => { 

        ?s ?p ?o. 

} 


The fact that there is a social input here ("ReliableSource") on the
inference is what stops 

the whole semantic web blowing up and having to be consistent of course. 

This is the "web" bit, if you like. 


Although these two readings are obviously closely related, and in 

some circumstances can be conflated, for example when discussing the 

formal semantics of a programming language, they are not the same. It 

is important to keep them distinct, especially when discussing 

referring formalisms (such as RDF and OWL) based on (D) ideas but 

deployed on a computationally defined network normally described 

using (C) terminology, it is necessary to carefully distinguish them. 


The architecture of the semantic web is in a way that the RDF and OWl use in
sense D is coupled with the existing processing ability of sense D in the
above architecture. 

This is the architecture of the [semantic] web. 


(cf: "It is important not to confuse the "refrigerator" as a cooling device
with the "refrigerator" which you seem to treat as a container. The concepts
of cooling and containment are separate." Well, no, the point is the fridge
is a cooler which contains things, and that is what make it useful) 


In particular, in sense (C), but not in sense (D), there is a 

presumption of a computable or effective process which can be applied 

to the identifier to provide access to the entity identified; an 

assumption (which follows from the previous) that the identification 

must be unique; and an understanding that this process might depend 

on the state of some computational system. None of these is 

assumptions is generally plausible for sense (D). 


You see there are three layers here. A URI in general is a wide class which
doesn't tell you anything, and so in all generality can "identify" anything.
There is nothjing in general oine can do processingwise. 


The InformationResourceIdentifier is a subclass of URI with a restriction on
the values of "identifies" toInformationResources and which computational
things can be done. 


The ApplicationIdentifier is a (distinct) subclass of URI, which is a
compount of an informatioResourceidentifier and a local identifier. This has
NO restriction in general on what it can identify, although different
applications -- identified by content type if and when a retrieval is done
-- make restrictions. 


On the other hand, formal analyses of sense (D) generally understand 

reference to be relative to an interpretation, and discuss meaning in 

terms of constraints on, and relationships between, interpretations. 

This style of analysis, and the terminology associated with it, has 

been a standard in formal semantics - logical semantics, formal 

linguistics and formal philosophy - for over half a century. The 

notion of interpretation involved has no particular connection with 

the sense of computational state underlying sense (C). Even when 

uniqueness of reference is required within an interpretation, 

guaranteeing uniqueness across all possible interpretations is 

usually meaningless or provably impossible. 


This is a separate discussion we have had before, If I have time I'll write
it up. 


The document often seems to slip between these two senses, in ways 

that suggest inappropriate conclusions. Several of the principles 

stated seem appropriate for sense (C) but are inappropriate, and in 

some cases positively harmful, if understood in sense (D). (Details 

below.) I would therefore ask that the authors clarify their intended 

meaning before publication. 


My personal belief if that the distinction between "Resource" in the sense
of "Whatever" and "resource" in the sense of "Information thing in the web
of information which may be linked to other things" needs to be made
clearly. I tried to get this change into the document some time ago, and
failed, as we ratholed. My apologies. We have formally decided to leave that
issue for a future version. 



(Meta-comment: In making similar comments on similar documents in 

the past, I have found that any attempt to ask for clarification on 

this point is met with resistance on the grounds that the intended 

meaning is obvious, and have been advised to consult an English 

dictionary. 


Note the case here. 


Leaving aside the potentially insulting nature of such a 

response, the key point is that the terminology used here is being 

used in technical senses, rather than the informal English senses; 

and moreover, much of this terminology already has technical senses 

which are well-established in disciplines which some readers of this 

document work within, and which are relevant to emerging Web 

technology. If words are being used here in ways which conflict with 

these established technical usages, therefore, it is important to 

make at least these aspects of the intent clear. For example, the 

semi-technical use of the term "resource" is unknown in the English 

language generally and even, as far as I am aware, in the general 

technical computer-science literature. It seems to be a usage special 

to the internet community.) 


--------- 


2. Hunting down what is meant by "resource". 



The rest of your comment (of which I have read the next 5 pages or so in
10pt) is a discussion of the occurrences of and confusion between uses of
the term "resource". 


I have red scribbled notes on it but I think the level of detail is much
lower than the preceding, so if necessary I will put it in a different post.



Tim BL
Received on Friday, 2 April 2004 12:08:32 UTC