- From: Tim Berners-Lee <timbl@w3.org>
- Date: Thu, 1 Apr 2004 17:32:31 -0500
- To: Pat Hayes <phayes@ihmc.us>
- Cc: public-webarch-comments@w3.org
- Message-Id: <7709C979-842C-11D8-A0E3-000A9580D8C0@w3.org>
In http://lists.w3.org/Archives/Public/public-webarch-comments/2004JanMar/ 1057.html Pat, you wrote, in commenting on > http://www.w3.org/TR/2003/WD-webarch-20031209/ > a lot of text to which I will endeavour to respond. I think the gist of your comments are carried in the first few pages, which I will respond to inline. In brief, you have a very good point. Yes, the document uses terminology in a confusing way. We had as a group shied away from trying to clear it up as it had been connected with issue httpRange-14 which had been postponed until after version 1.0. I think your comment makes it clear how important it is for that issue to be resolved. I willl give my take on the resolution of it below. Please excuse typos. > 1. General comment about vocabulary > > The vocabulary used throughout this document can be understood in two > rather different ways, which conflict with one another. Exactly what > is being said is therefore not always clear, and in some cases may be > understood by some readers to have different meanings than those > intended. It would be helpful if the terminology used could be, if > not defined, at least have its intended meanings clarified somewhat. Yes. An ontology may help. I have sort of run one off on the fly below. Basically, we are using this term "Resource" for two concepts, which you identify below in (C) and (D). > I realize that this kind of request conflicts with the requirements > of ease of reading and general literary style, but it is nevertheless > important; it could be done with a glossary, for example. However, in > order to be useful, a glossary should not merely repeat sentences > from the text using the same terminology. Thus > "Resource: An item of interest in the information space known as the > World Wide Web" is completely uninformative since the definition > repeats the words used in the text and hence does not resolve their > ambiguity or provide any other way to grasp their intended meaning. > > The specific ambiguity revolves around a group of terms (semantic, > represent, identify, refer, about, meaning, resource) which can be > understood in two rather different ways, which I will refer to as (C) > and (D). > > (C) as in a programming language, where an identifier serves to > uniquely locate (relative to the current computational state of a > virtual machine or a network) some piece of data. Approximate > synonyms for 'identifier' in this sense include 'link', 'address' and > 'pointer'; and ideas like hash coding and database key are also > connected with this sense of 'identify'. > The corresponding usage of 'representation' is where one speaks of a > representation of data, or of the state of a computational entity. > The corresponding usage of 'resource' is something that is, or can in > principle be, identified in this sense: a computational entity (or > the state of it) which is accessible via some network link or > transfer protocol. > The corresponding usage of 'semantic' language is closely analogous > to the way this terminology is typically used in describing the > semantics of computational systems. I think in section C your concept of "resource" is something like a program or a hypertext file. I will call these Information Resources. These are things which can have representations. representation rdfs:domain InformationResource; rdfs:range Representation. > (D) as in a descriptive language, such as English or formal logical > languages, where "identifier" is synonymous with 'name', and to > identify means simply to refer to, name or denote. > The corresponding sense of 'representation' is 'description' (or > possibly 'formal description'), in the sense used in KR work, AI and > formal linguistics. > The corresponding sense of 'resource' would be simply 'entity' or > 'thing', ie the word used in this way has no special Web or > Internet-related meaning and is simply a synonym for 'entity' in the > philosophical sense: anything that can be referred to, ie anything. > The corresponding usage of the 'semantic' language is more analogous > to the way that this terminology is used in linguistics, philosophy, > logical semantics and AI/KR work. This is the concept of "Resource" as "Thing", Top, "Whatever", etc. In general, URIs can identify anything, so there are no range constraints to "identifies". One can use owl:Thing for this class. Some URIs, however identify InformationResources. { ?u a InformationResourceIdentifier. identifies ?x } => { ?x a InformationResource }. For example, HTTP makes a web -- "information space" --- of information resources. { ?u string:matches "http:[^#]" } => { ?u a InformationResourceIdentifier }. There is a lot of architecture which is specifically about InformationResources, but as the distinction isn't made in the document it leaves one confused. Specifically, the hash is a major building block which is defined for InformationResources. If you take the identifier of an informationresource and add a hash and a local identifier of something mentioned within that information resource, you forge a global identifier for whatever the local identifier was for. { (?u "#" ?v) string:concatenation ?w } <=> { ?w irid ?u; localid ?v. }. irid rdfs:domainApplicationIdentifier; rdfs:range InfromationResourceIdentifier. ApplicationIdentifier rdfs:isSubclassOf URI. # The first WWW application was global hypertext. # In that application, the applicatoin identifiers identify Anchors. # (start/end of link in the dexter model of hypertext) { ?w identifies ?x; irid ?u; frag ?v. ?u identifies ?y. ?y representation [ internetContentType "text/html"; data ?d ]. } => { ?x a ht:Anchor; ht:anchorId ?v; ht:pageSource ?d. } { ?w identifies ?x; irid ?u. ?u identifies ?y. ?y representation [ internetContentType "application/rdf+xml" ]. } => { # As RDF documents are general and the symbols identify # arbitrary things, nothing to be deduced directly about the type of ?x } # However one has local rules of web browsing such as { ?w identifies ?x; irid ?u. ?u identifies ?y. ?y representation [ internetContentType "application/rdf+xml" ; data ?d]; a ReliableSource. ?d log:parsedAsRDFXML [ log:includes { ?s ?p ?o }] } => { ?s ?p ?o. } The fact that there is a social input here ("ReliableSource") on the inference is what stops the whole semantic web blowing up and having to be consistent of course. This is the "web" bit, if you like. > Although these two readings are obviously closely related, and in > some circumstances can be conflated, for example when discussing the > formal semantics of a programming language, they are not the same. It > is important to keep them distinct, especially when discussing > referring formalisms (such as RDF and OWL) based on (D) ideas but > deployed on a computationally defined network normally described > using (C) terminology, it is necessary to carefully distinguish them. The architecture of the semantic web is in a way that the RDF and OWl use in sense D is coupled with the existing processing ability of sense D in the above architecture. This is the architecture of the [semantic] web. (cf: "It is important not to confuse the "refrigerator" as a cooling device with the "refrigerator" which you seem to treat as a container. The concepts of cooling and containment are separate." Well, no, the point is the fridge is a cooler which contains things, and that is what make it useful) > In particular, in sense (C), but not in sense (D), there is a > presumption of a computable or effective process which can be applied > to the identifier to provide access to the entity identified; an > assumption (which follows from the previous) that the identification > must be unique; and an understanding that this process might depend > on the state of some computational system. None of these is > assumptions is generally plausible for sense (D). You see there are three layers here. A URI in general is a wide class which doesn't tell you anything, and so in all generality can "identify" anything. There is nothjing in general oine can do processingwise. The InformationResourceIdentifier is a subclass of URI with a restriction on the values of "identifies" toInformationResources and which computational things can be done. The ApplicationIdentifier is a (distinct) subclass of URI, which is a compount of an informatioResourceidentifier and a local identifier. This has NO restriction in general on what it can identify, although different applications -- identified by content type if and when a retrieval is done -- make restrictions. > On the other hand, formal analyses of sense (D) generally understand > reference to be relative to an interpretation, and discuss meaning in > terms of constraints on, and relationships between, interpretations. > This style of analysis, and the terminology associated with it, has > been a standard in formal semantics - logical semantics, formal > linguistics and formal philosophy - for over half a century. The > notion of interpretation involved has no particular connection with > the sense of computational state underlying sense (C). Even when > uniqueness of reference is required within an interpretation, > guaranteeing uniqueness across all possible interpretations is > usually meaningless or provably impossible. This is a separate discussion we have had before, If I have time I'll write it up. > The document often seems to slip between these two senses, in ways > that suggest inappropriate conclusions. Several of the principles > stated seem appropriate for sense (C) but are inappropriate, and in > some cases positively harmful, if understood in sense (D). (Details > below.) I would therefore ask that the authors clarify their intended > meaning before publication. My personal belief if that the distinction between "Resource" in the sense of "Whatever" and "resource" in the sense of "Information thing in the web of information which may be linked to other things" needs to be made clearly. I tried to get this change into the document some time ago, and failed, as we ratholed. My apologies. We have formally decided to leave that issue for a future version. > (Meta-comment: In making similar comments on similar documents in > the past, I have found that any attempt to ask for clarification on > this point is met with resistance on the grounds that the intended > meaning is obvious, and have been advised to consult an English > dictionary. Note the case here. > Leaving aside the potentially insulting nature of such a > response, the key point is that the terminology used here is being > used in technical senses, rather than the informal English senses; > and moreover, much of this terminology already has technical senses > which are well-established in disciplines which some readers of this > document work within, and which are relevant to emerging Web > technology. If words are being used here in ways which conflict with > these established technical usages, therefore, it is important to > make at least these aspects of the intent clear. For example, the > semi-technical use of the term "resource" is unknown in the English > language generally and even, as far as I am aware, in the general > technical computer-science literature. It seems to be a usage special > to the internet community.) > > --------- > > 2. Hunting down what is meant by "resource". > The rest of your comment (of which I have read the next 5 pages or so in 10pt) is a discussion of the occurrences of and confusion between uses of the term "resource". I have red scribbled notes on it but I think the level of detail is much lower than the preceding, so if necessary I will put it in a different post. Tim BL
Attachments
- text/enriched attachment: stored
Received on Thursday, 1 April 2004 17:32:42 UTC