- From: Pat Hayes <phayes@ihmc.us>
- Date: Wed, 17 Mar 2004 16:38:52 -0600
- To: public-webarch-comments@w3.org
- Cc: w3c-rdfcore-wg@w3.org
- Message-Id: <p06001f10bc779f2e9ec4@[10.0.100.76]>
The following are some personal comments on http://www.w3.org/TR/2003/WD-webarch-20031209/ Sorry they're late. ------ 1. General comment about vocabulary The vocabulary used throughout this document can be understood in two rather different ways, which conflict with one another. Exactly what is being said is therefore not always clear, and in some cases may be understood by some readers to have different meanings than those intended. It would be helpful if the terminology used could be, if not defined, at least have its intended meanings clarified somewhat. I realize that this kind of request conflicts with the requirements of ease of reading and general literary style, but it is nevertheless important; it could be done with a glossary, for example. However, in order to be useful, a glossary should not merely repeat sentences from the text using the same terminology. Thus "Resource: An item of interest in the information space known as the World Wide Web" is completely uninformative since the definition repeats the words used in the text and hence does not resolve their ambiguity or provide any other way to grasp their intended meaning. The specific ambiguity revolves around a group of terms (semantic, represent, identify, refer, about, meaning, resource) which can be understood in two rather different ways, which I will refer to as (C) and (D). (C) as in a programming language, where an identifier serves to uniquely locate (relative to the current computational state of a virtual machine or a network) some piece of data. Approximate synonyms for 'identifier' in this sense include 'link', 'address' and 'pointer'; and ideas like hash coding and database key are also connected with this sense of 'identify'. The corresponding usage of 'representation' is where one speaks of a representation of data, or of the state of a computational entity. The corresponding usage of 'resource' is something that is, or can in principle be, identified in this sense: a computational entity (or the state of it) which is accessible via some network link or transfer protocol. The corresponding usage of 'semantic' language is closely analogous to the way this terminology is typically used in describing the semantics of computational systems. (D) as in a descriptive language, such as English or formal logical languages, where "identifier" is synonymous with 'name', and to identify means simply to refer to, name or denote. The corresponding sense of 'representation' is 'description' (or possibly 'formal description'), in the sense used in KR work, AI and formal linguistics. The corresponding sense of 'resource' would be simply 'entity' or 'thing', ie the word used in this way has no special Web or Internet-related meaning and is simply a synonym for 'entity' in the philosophical sense: anything that can be referred to, ie anything. The corresponding usage of the 'semantic' language is more analogous to the way that this terminology is used in linguistics, philosophy, logical semantics and AI/KR work. Although these two readings are obviously closely related, and in some circumstances can be conflated, for example when discussing the formal semantics of a programming language, they are not the same. It is important to keep them distinct, especially when discussing referring formalisms (such as RDF and OWL) based on (D) ideas but deployed on a computationally defined network normally described using (C) terminology, it is necessary to carefully distinguish them. In particular, in sense (C), but not in sense (D), there is a presumption of a computable or effective process which can be applied to the identifier to provide access to the entity identified; an assumption (which follows from the previous) that the identification must be unique; and an understanding that this process might depend on the state of some computational system. None of these is assumptions is generally plausible for sense (D). On the other hand, formal analyses of sense (D) generally understand reference to be relative to an interpretation, and discuss meaning in terms of constraints on, and relationships between, interpretations. This style of analysis, and the terminology associated with it, has been a standard in formal semantics - logical semantics, formal linguistics and formal philosophy - for over half a century. The notion of interpretation involved has no particular connection with the sense of computational state underlying sense (C). Even when uniqueness of reference is required within an interpretation, guaranteeing uniqueness across all possible interpretations is usually meaningless or provably impossible. The document often seems to slip between these two senses, in ways that suggest inappropriate conclusions. Several of the principles stated seem appropriate for sense (C) but are inappropriate, and in some cases positively harmful, if understood in sense (D). (Details below.) I would therefore ask that the authors clarify their intended meaning before publication. (Meta-comment: In making similar comments on similar documents in the past, I have found that any attempt to ask for clarification on this point is met with resistance on the grounds that the intended meaning is obvious, and have been advised to consult an English dictionary. Leaving aside the potentially insulting nature of such a response, the key point is that the terminology used here is being used in technical senses, rather than the informal English senses; and moreover, much of this terminology already has technical senses which are well-established in disciplines which some readers of this document work within, and which are relevant to emerging Web technology. If words are being used here in ways which conflict with these established technical usages, therefore, it is important to make at least these aspects of the intent clear. For example, the semi-technical use of the term "resource" is unknown in the English language generally and even, as far as I am aware, in the general technical computer-science literature. It seems to be a usage special to the internet community.) --------- 2. Hunting down what is meant by "resource". It is extremely difficult (for this reader) to find out what this word is supposed to mean, in spite of its being so central. The document as a whole does not seem to have a single view on the intended meaning, in fact. Much of the document makes sense only under the more limited (C) reading, but in places what it says is only consistent with the (D) reading. As a result, the document as a whole does not seem to have a coherent single reading. The rest of this comment is devoted to documenting this particular issue. (It would be possible to keep these interpretations straight by a systematic use of terminology. For example, one might use "resource" in the second D sense (unconventionally, but consistently) together with "refer", and "web resource" (or "network resource" ?) in the first C sense together with "identify", with an understanding that the second usage is intended to be a special case of the former, so that any maxims which apply to the first, broader, sense also apply to the latter, but not necessarily the reverse.) The latter (D) interpretation seems to be insisted upon by the cited document http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html which reads: "Resource Anything that can be named or described can be a resource. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation or the types of a relationship (e.g., "parent" or "employee"). " Which could be paraphrased as "A resource can be anything, and everything is a resource". I note particularly the phrasing "named or described". (I also note in passing that the first three "familiar" examples are hardly typical of entities in general, and that the examples do not include such things as galaxies, atoms, grains of sand; kinds of material such as steel or wood; holes, times, locations, intervals; natural processes such as flows and movements; and many other categories of entity which have been the subject of formal ontological descriptions. Are these omissions deliberate?) The only example given in the document is disturbingly vague at precisely this critical point: the resource is the "Oaxaca Weather Report". But what KIND of thing is that, and how exactly is it related to the URI and the "representation" of it? (see later for more on that word) Several different answers are consistent with what you say about the example. (a) Do you mean something like an abstraction of a document, in the sense that "Moby Dick" refers to a resource called a novel, which is an abstraction of all the printed, spoken etc. tokens of Moby Dick ever produced (which could be described as "representations" of it, although "token" is the existing technical term in wide use here.) (b) Do you mean that the resource here is the actual weather - the state of the atmosphere - in Oaxacala on the day in question? So that the HTML 'represents' this in the sense of talking about it - referring to it, describing it - which is the usual way that "represent" is used in normal language, formal semantics and linguistics. (c) Do you mean that the resource here is the thing on the server that processes the request and which emits the text/html representation, which is therefore a representation of the state of a computational entity which is physically attached to the network? That is, the resource is a computational entity of some kind, or its state? This would be consistent with the first C sense of 'identify' and with the description in the first sentence of the abstract referring to 'resources interconnected by links'. (d) Or do you intend to be systematically ambiguous between these alternatives, so as to try to apply to them all? I hope not, because they are not mutually compatible; and if not, it would be extremely helpful if you could clarify your intended meaning, perhaps by fleshing out the description of the example with a little more conceptual detail. Trying to home in on your intended meaning by searching the document for uses of "resource" gives the following: [[The World Wide Web is a network-spanning information space of resources interconnected by links. ]] I take it then that a resource is something that can be connected by a link to another resource. I presume also that "link" here means more than simply a reference to something, but connotes an actual connection of some kind (eg along which information can be transmitted.) This seems like sense (C), and is not intelligible when applied in any broader sense. [[The World Wide Web (WWW, or simply Web) is an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URIs).]] [[Each resource is identified by a URI.]] ** So resources are items of interest (of interest to who?) which are identified by URIs. Unfortunately, this runs into the ambiguity already noted in the meaning of "identify" so does not help decide between senses (C) and (D) (Already there is a tension in meaning: are the only items of interest those that can be connected with links? Surely not.) [[Web agents communicate information about the state of a resource.... cf also many later references, such as .... Representation data, electronic data about resource state.... ....]] So resources have states, and information about those states is communicated. Again seems like sense (C), since many entities that can be named do not have states (eg numbers, arithmetic operators). [[A URI must be assigned to a resource in order for agents to be able to refer to the resource]] This seems to rule out any notion of reference by description. This makes sense for interpretation (C), but for interpretation (D) is a very strong prohibition, and if followed in most communication scenarios would render effective communication impossible. OWL expressions for example may describe classes by restrictions on properties; such classes are the same kind of entities that OWL uses URIs to refer to, but are not themselves identified or referred to by any URI; nevertheless they are referred to (in sense (D), i.e. denoted) by the OWL expressions, and such references are in fact the most typical form of reference to classes used in OWL reasoning. If 'refer' in this quote is understood in sense (D), therefore, the claim seems to be wrong even on the WWW. I take it therefore that this is intended in sense (C), where 'refer' means 'link to', rather than interpretation (D) [[Resources exist before URIs; a resource may be identified by zero URIs]] This seems to directly contradict the third quotation above (**). Which is correct? [[A resource owner SHOULD assign a URI to each resource that others will expect to refer to.]] (Principle) I take it then that resources have owners who are capable of assigning URIs so as to refer to the resources. (This is the first place in the document where this idea of ownership is mentioned, and no explanation is given. Later, 'owners' of URIs, rather than of resources, are discussed: what is the relationship between these?). This also seems completely inappropriate in sense (D). Obviously, being able to refer to something does not connote ownership of it. Most entities that are referred to by names or descriptions in language have no owners. I note that the reference to Engelbart 90 yields the following: "in principle, every object that someone might validly want/need to cite should have an unambiguous address (capable of being portrayed in a manner as to be human readable and interpretable). (E.g., not acceptable to be unable to link to an object within a "frame" or "card.")" which seems to clearly indicate (by its use of "address" and "link", and the presumption that objects are parts of computationally defined text rather than arbitrary referents) that Engelbart was talking in sense (C); which is in any case obvious from reading Engelbart. [[The English statement "'http://www.example.com/moby' identifies 'Moby Dick'" is ambiguous because one could understand the phrase "Moby Dick" to refer to distinct resources: a particular printing of this work, or the work itself in an abstract sense, or the fictional white whale, or a particular copy of the book on the shelves of a library (via the Web interface of the library's online catalog), or the record in the library's electronic catalog which contains the metadata about the work, or the Gutenberg project's online version.]] Here some actual examples are given of resources, which seem to clearly rule out any attempt to understand the term in the computational sense (C). Obviously a copy of a book on a shelf, a fictional white whale, or a novel in an abstract sense, are not the kind of entities that can be connected by a link to anything else, or that have computational states. So the only way to understand this seems to be in sense (D) I note that apart from this sentence, and the rather strange definition given in http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html , it would be possible to read the entire document in sense (C), so that "resource" meant "entity on a network" and "identify" meant "link to". Most of the document would make perfect sense under this narrower interpretation. [[URI ambiguity arises a URI is used to identify two different Web resources.]] "Web resource" is a new idea but is not defined or mentioned elsewhere. This seems to suggest a distinction between Web resources and other kinds of resource (??). It might be helpful if this distinction could be clarified and made more explicit. In any case, section 2.3.1 seems to be important, and depends crucially on the meaning of this distinction, which should therefore be clarified. [[....unsafe interactions may cause a change to the state of a resource and the user may be held responsible for the consequences of these interactions. ]] This seems to suggest that users can cause changes to the state of a resource. Again, this makes sense on view (C) but reads very strangely under the (D) interpretation, since one would not normally expect that a reference to an entity would enable any interaction to take place with the entity at all. (This sentence refers to Julius Caesar, but gives me no power to *do* anything to him.) [[Emerging Semantic Web technologies, including the "Web Ontology Language (OWL)" [OWL10], define RDF [RDF10] properties such as sameAs to assert that two URIs identify the same resource or functionalProperty to imply it.]] This is an explicit use of a sense which is formally defined to be sense (D), so apparently requires that we interpret the document in sense (D). ------------ Sense D is arguably the most general and all-inclusive sense. Unfortunately, if we do interpret the semantic language in sense (D), a great deal of the document makes no sense and much of it is wrong. To elaborate, starting again from the beginning, and interpreting in sense D: [[Each resource is identified by a URI.]] This says that every entity has a name. This is completely false, in fact *provably* false; and so to propose that it should be true is silly. Note that even http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html only requires that a resource be describable, not nameable. [[Parties who wish to communicate must agree upon a shared set of identifiers and on their meanings]] This is not true in sense (D). It is probably impossible to completely agree upon meanings of words in English, for example. Communication does not require complete agreements upon meanings: such agreement could only be established by communications in any case. [[A URI must be assigned to a resource in order for agents to be able to refer to the resource.]] Again, completely false for sense (D): see above comments on reference by description in OWL. (Aside: the following sentence is extremely muddled in its logic however one reads it: "It follows that a resource should be assigned a URI if a third party might reasonably want to link to it, make or refute assertions about it, retrieve or cache a representation of it, include all or part of it by reference into another representation, annotate it, or perform other operations on it." a. Doing something to a reference to X is not performing an operation on X; Doing something to a representation of X is not performing an operation on X. b. Retrieving or caching a representation requires that the representation is accessible, not the thing represented. c. Making assertions about something can be done without naming the thing referred to. For example, 'Your mother is a whore', 'my brother's favorite hamster died yesterday'. ) [[the resource identified by a URI does not depend on the context in which the URI appears.]] In sense (D), this is at best a pious hope and cannot possibly be enforced. There is good reason to suppose that it is usually false, in any case. [[URI ambiguity should not be confused with ambiguity in natural language. The English statement "'http://www.example.com/moby' identifies 'Moby Dick'" is ambiguous because one could understand the phrase "Moby Dick" to refer to distinct resources: a particular printing of this work, or the work itself in an abstract sense, or the fictional white whale, or a particular copy of the book on the shelves of a library (via the Web interface of the library's online catalog), or the record in the library's electronic catalog which contains the metadata about the work, or the Gutenberg project's online version.]] But in sense (D), URI ambiguity is exactly like ambiguity in natural language, so this advice to not confuse them seems meaningless. In fact, in sense (D), all naming is *inherently* ambiguous, since it is always possible for one party to make ontological distinctions which were not being made by the other party. Examples from natural language are legion, but the same issue crops up in exchanging information between formal data repositories and ontologies ("Semantic integration", "Data fusion") and has been long recognized as ubiquitous and inherent in the use of formal vocabularies. Attempts to establish exact unambiguous meanings are bound to fail, and to require that something essentially impossible be done before any communication can take place is extremely poor advice. So this advice, and the "good practice" is in fact extremely poor practice if understood in sense (D). [[URI persistence is a matter of policy and commitment on the part of authorities servicing URIs.... content negotiation also promotes consistency, as a site manager is not required to define new URIs when adding support for a new format specification. ... It is reasonable to limit access to a resource ... .... The Web provides several mechanisms to control access to resources...]] All of this language makes sense only in the (C) reading of the terminology. -------------- 3. What is a "representation" ? This word has a usage which is current throughout linguistics, formal semantics, logic, philosophy, AI and cognitive science more generally, in which it is roughly synonymous with 'formal description'. The document seems to follow the REST architecture description in using it in a different sense, or perhaps a very restricted and special sense. It would be helpful if this sense could be made clear and stated unambiguously. (I am honestly unclear what the exact meaning of "representation" is in the REST architectural descriptions, after several attempts to get it clarified.) Consider for example the main 'story' re-told with RDF/XML in place of HTML. Using the terminology from the first illustration (which is very good to look at, BTW), the URI identifies a resource called the Oaxaca Weather Report, and there is a 'representation' of that resource which, rendered in the way that the diagram shows the HTML, might instead look like this: Metadata: content-type rdf/xml Data: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:weather="http://www.srh.noaa.gov/data/rdf/forecasts/#"> ... <place:oaxaca rdf:about="http://www.geog.org/coord/3234944LO-1151811"> <weather:timeAt> ... </weather:timeAt> <weather:forecastType> "cloudyvariable" </weather:forecastType> .... </place:oaxaca> .... Following the usage of "representation" in the document, this is a representation of a weather *report*. However, following the common usage of "representation" mentioned above - the D sense - this is a representation (in RDF) of the actual weather in Oaxaca. That is, it describes a state of the actual atmosphere above a part of the earth's surface during a certain period: that is what it *represents*. In this sense, RDF (or RDFS or OWL) is understood as a formal syntax which expresses propositions about the world, which describes the world. This latter sense of "represent" is what the formal RDF semantics talks about, for example, and it is what the RDF primer means when it refers to "RDF/XML *describing* Eric Miller" (my emphasis) in the caption to its first example: http://www.w3.org/TR/2003/WD-rdf-primer-20031010/#example1 Call these senses respectively the (C) and (D) senses. To see the difference, consider what it means for the representation to be in error. In the first C sense, it means that the RDF/XML does not accurately mirror the state of the weather REPORT, and presumably reflects some transmission or protocol error on the network, and has nothing to do with the weather. In the second D sense it means that the weather report is inaccurate, which may well have nothing to do with the network at all. In the first case, you phone the people in charge of the network; in the second, you phone the weather forecasters. Using the second D sense, it is precisely the ability to represent - that is, to describe - things that are not linked or connected to a network that makes the Web so useful: it is able to communicate information about anything at all. (It is probably why Nadia used the Web in the first place: she is likely to be more interested in the actual weather in Oaxaca than in the weather report considered as an object. ) On this view, however, all talk of the things at the sharp end of a 'represents' arrow being in any way connected with any computational process or communication protocol is meaningless: simply being able to *refer* to something does not give one any kind of hold on it: it does not presuppose any way to compute a pathway to it, to access it, to link to it or to perform any operations on it. (This is what the document seems to mean when it says in the scenario description that "the resource is ABOUT the weather in Oaxaca" (my emphasis), which is clearly distinct from the relationship between the 'representation' and the resource. ) My point here is not to argue that either sense of 'represent' is correct, but only to ask you to make your intended sense clear, particularly if it differs (as it seems to) from the sense in which this word is widely understood. An example of a failed attempt to clarify the meaning comes in section 4. The first sentence reads: [[A data format (including XHTML, CSS, PNG, XLink, RDF/XML, and SMIL animation) specifies the interpretation of representation data. ]] with a link to [[Representation data, electronic data about resource state...]] which seems to imply that RDF/XML is a representation of resource state. But in the D sense of the semantic vocabulary, used specifically in the RDF documentation, resources need not have a state (and even if they do, the referent of a URI is not required to be a state rather than the resource itself); and applying this to the example, the "resource" would have to be not the weather report, but the atmosphere in the vicinity of Oaxaca. -------------- 4. Detailed textual comments (in order through the document): [[The World Wide Web (WWW, or simply Web) is an information space...]] Please define 'information space' and the intended meanings implicit in the use of "in" and "on" applied to it, or use a more pedestrian terminology. A reasonably extensive Google search on this phrase does not find any sense of it which makes this sentence coherent. [[... typical behavior of Web agents - people or software (on behalf of a person, entity, or process) acting on this information space]] Should probably read "software (acting on behalf...) " Do you really want to count people as Web agents? This seems a needlessly general scope, and much of the subsequent technical advice and recommended practice reads oddly if applied to people. More on this later. (BTW, with current technology, Nadia may not need to know a URI when she sees one. Eudora recognizes them for me, for example, and opens my browser when required.) [[Protocols define the syntax and semantics of messages exchanged by agents over a network. Web agents communicate information about the state of a resource through the exchange of representations.]] This seems to imply that Web agents are software, not human. [[This scenario illustrates the three architectural bases of the Web.... Nadia (by clicking on a hypertext link) tells her browser to request... The browser sends an HTTP GET request .... the browser retrieves and displays...]] This wording seems to suggest that the Web architecture is centrally concerned with browsing. While likely true, is this what is intended to be conveyed here? The figure has arrows pointing from the URI to the resource and from the representation to the resource. But the scenario describes how the URI can be used to access a representation FROM the resource. It seems odd that there is no pathway in the diagram from the URI to the representation. [[... understanding the REST model and consider the role to which of its principles could guide their design...]] seems ungrammatical: role/extent ...(??) Some of the principles in 1.1.3 read like platitudes (especially "good practice"). Would it be possible to give these points a little more substance? 1.2 [[A number of general architecture principles apply to across all three bases of Web architecture.]] apply to what? Which 'three bases' are being referred to here? 1.2.2 [[This document does not distinguish in any formal way the terms "format" and "language." Context has determined which term is used.]] This is unfortunate, as the word "language" is widely used to connote a much more extensive set of assumptions than the term "format". The context of use in the document is not always sufficient to determine what is meant. [[Language subset: one language is a subset (or, "profile") of a second language if any document in the first language is also a valid document in the second language and has the same interpretation in the second language.]] //One language is a subset of another if any document in the first language is also a valid document, with the same interpretation, in the other language. [[The manner in which they are dealt with depends on application context.]] //Application context determines the manner of dealing with them. [[User agents that correct errors without the consent of the user are not acting on the user's behalf..... Silent recovery from error is harmful.]] Really?? I beg to differ. Surely such actions are in fact a large part of what we have user agents for. At least give some reason to justify this claim, which seems quite arbitrary and in fact inconsistent with GUI design principles. [[Experience with the cost of building a user agent to handle the diverse forms of ill-formed HTML content convinced the authors of the XML specification to require that agents fail deterministically upon encountering ill-formed content. Because users are unlikely to tolerate such failures, this design choice has pressured all parties into respecting XML's constraints, to the benefit of all.]] There are benefits, but there are also costs. Entire development paths are cut off from XML applications because of the rigidity of the XML specs. This entire issue is more complicated than this naively optimistic paragraph suggests. I would suggest omitting this controversial claim or at least indicating that it is possible to rationally disagree. 2. [[Parties who wish to communicate must agree upon a shared set of identifiers and on their meanings.]] Stated this broadly this is false, and there is no need to state it this broadly in an architecture document. I suggest simply removing this sentence. (The first sentence of section 3.4 says it better: "Successful communication between two parties using a piece of information relies on shared understanding of the meaning of the information.") [[The identification mechanism for the Web is the URI.]] URIs are not mechanisms. Please rephrase this coherently. [[A URI must be assigned to a resource in order for agents to be able to refer to the resource. It follows that a resource should be assigned a URI if a third party might reasonably want to link to it, make or refute assertions about it, retrieve or cache a representation of it, include all or part of it by reference into another representation, annotate it, or perform other operations on it.]] As noted above, the first sentence here is false if 'refer to' is understood in its commonly used sense. The list of conditions in the second sentence are not all in the same category: to link to it requires a unique URI, but to make or refute assertions about it, or to [manipulate] a representation of it, does not. None of these are in any way comparable to performing operations on it, which indeed requires a more direct form of access to the resource itself. The phrase "other operations" is misleading as the previous items are not performance of operations on the resource. [[.. there are many benefits to assigning a URI to a resource... A resource owner SHOULD assign a URI to each resource that others will expect to refer to.]] Nothing has been said until this point about ownership of resources, or about assigning URIs to resources. Questions arise immediately: What counts as ownership in this context (particularly if 'resource' has the broad (D) interpretation)? How are URIs assigned to resources? (Is there a method or technique for 'assignment' in this sense?) Can assignment only be done by the owner of the resource and/or the URI? The text should discuss this issue, if only briefly. [[For example, the parties responsible for weather.example.com should not use both "http://weather.example.com/Oaxaca" and "http://weather.example.com/oaxaca" to refer to the same resource; agents will not detect the equivalence relationship by following specifications.]] I do not follow this. What is the problem here? We have just been told that a resource may have more than one URI. What 'equivalence relationship' is being referred to? Is the point that people will confuse these but software will not? But if they refer to the same resource, why does this matter? And in general, how is this entire discussion squared with the opacity discussion later? [[If a URI has been assigned to a resource, agents SHOULD refer to the resource using the same URI, character for character.]] Why?? This seems to be at odds with the point just made. If owners can assign more than one URI to a resource, why must agents use only one of them? If they must, what is the point of creating more than one? [[ the agent has a unique relationship with the URI, called URI ownership. ]] Does 'agent' here include software agents? 2.3 [[the ambiguous use of terms imposes a cost in communication.]] This is a controversial claim. It can be argued that it is only ambiguity which makes communication possible at all, in one sense of 'ambiguity'. Time and email space do not permit a full comment on this, but I would suggest omitting or qualifying it. [[URI ambiguity refers to the use of the same URI to refer to more than one distinct resource.]] Again, this is itself, ironically enough, ambiguous. If you mean 'refer' in the (C) sense, I would agree, since communication protocols require that the ambiguity be resolved. If you mean 'refer' in sense (D), then it is not clear that ambiguity in this sense can possibly be avoided, certainly not for computational systems. (This follows ultimately from Goedel's second incompletenesss theorem.) [[The English statement "'http://www.example.com/moby' identifies 'Moby Dick'" is ambiguous because one could understand the phrase "Moby Dick" to refer to distinct resources: a particular printing of this work, or the work itself in an abstract sense, or the fictional white whale, or a particular copy of the book on the shelves of a library (via the Web interface of the library's online catalog), or the record in the library's electronic catalog which contains the metadata about the work, or the Gutenberg project's online version.]] What exactly is being said here? If these various things are indeed all resources, then is the claim that the ambiguity arises from the use of the English quoted phrase? That indeed makes sense, but then how exactly does the owner of that URI specify which of them is the intended resource which the URI uniquely identifies? There seems to be no way around the ambiguity inherent in general reference. This comment seems to raise more issues than it resolves and might be better omitted. [[URI ambiguity arises a URI is used to identify two different Web resources.]] ...when a URI is used... I will try to explain in another document why referential ambiguity is not only not always a bad thing. Basically, you can't outlaw it, so why bother trying: but in addition, it in fact can be useful. Most English words are systematically ambiguous, because its easier to get reliable communication over a noisy low-bandwidth channel by overloading the words in ways that can be easily resolved from context than it is to try to invent distinct signs for all the possible nuances of meaning, particularly when those nuances cannot be computed ahead of time, in general. Most of the nuances are irrelevant most of the time in any case. For example, it is almost certainly harmless to allow a URI to be ambiguous between a person and a homepage, as long as one can easily distinguish homepages from people and map between them when required (ie you can easily coerce in either direction). Allowing a URI to be ambiguous between a star and a planet might be rather nastier, since the astronomy context will often not allow you to resolve a difference which might be important. Many issues arise: but to just give a blanket 'ambiguity is bad' rule is way too simplistic. BTW, I wholeheartedly concur with http://lists.w3.org/Archives/Public/www-tag/2002Sep/0132 2.5 [[Agents making use of URIs MUST NOT attempt to infer properties of the referenced resource except as licensed by relevant specifications]] Does this include human agents? I certainly do this a lot, myself, see nothing wrong with it, and don't propose to stop doing it. But in any case, this seems to fly in the face of current practice, if I understand it correctly. When I use Google, my browser comes back with a display of a (representation of) something with a URI that looks like this: http://www.google.com/search?as_q=pat+hayes&num=10&hl=en&ie=ISO-8859-1&btnG=Google+Search&as_epq=&as_oq=&as_eq=&lr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=&safe=off which is absolutely chock full of information from which software can be said to infer properties of the referenced resource. This kind of thing is done all the time. It sounds like you are trying to say that Google MUST NOT do what it does. Frankly, this would be a very bad political move: Google is of far more value to the Web than the entire W3C. 3.1 [[Agents may use a URI to access the referenced resource]] Is this may as in 'sometimes possible' or as in 'generally have permission'? Is it always possible to use a URI to access a resource? How can this be reconciled with the (D) definition of resource? (How does one access an imaginary white whale?) In general, this entire section seems to make sense only with the narrow (C) reading of 'resource' to mean 'thing physically attached to a network". The terminology needs to be kept straight in order for the text to be comprehensible. 3.2 [[The Web's protocols ... are based on the exchange of messages.]] What kinds of entity do this exchanging of messages? (Resources? Agents? Both?) [[Agents use representations to modify as well as retrieve resource state]] I find this puzzling. How does an agent use a REPRESENTATION to modify something? Representations aren't the kind of thing that DO anything. (??) 3.3.1 [[Interpretation of the fragment identifier during a retrieval action is performed solely by the agent]] By which agent? [[A resource owner who creates a URI with a fragment identifier and who uses content negotiation to serve multiple representations of the identified resource SHOULD NOT serve representations with inconsistent fragment identifier semantics]] What sense of 'semantics' is meant here? What counts as 'inconsistent'? In the example given, does this mean that the png and jpeg should be the "same picture" ? What exactly does this mean? (eg suppose one has a different color balance, or is a slightly different size: is that an inconsistency?) (part of the issue here is that words like 'inconsistent' have tight technical meanings, and it is not lcear if you mean to 3.4 [[the design choice for the Web is, in general, that the owner of a resource assigns the authoritative interpretation of representations of the resource.]] HOW?? Since this point is so central, surely some guidance should be given as to how to perform this miracle of referential precision. The example given explains how the authority decides what representations to send to Nadia. It says nothing about how to make sure that these representations uniquely refer, or how they are given an interpretation. There is a deeper issue. Suppose the owner assigns an authoritative interpretation: how is this INTERPRETATION communicated to Nadia? Nothing has been said about how to communicate interpretations of representations over the Web. None of this section makes sense (on either the C or D readings). [[User agents MUST NOT silently ignore authoritative server metadata..... if Nadia's browser detects a problem, Nadia's browser must not silently ignore the problem and render the JPEG image.]] Why not?? Again, this seems unmotivated, arbitrary and inconsistent with good application design in many cases. And, frankly, it doesn't seem like any of your business: its a user-application decision, not a web-architecture decision. 3.5.1 [[It is a breakdown of the Web architecture if agents cannot use URIs to reconstruct a "paper trail" of transactions]] Does this apply even to safe interactions? 3.6.2 [[There are strong social expectations that once a URI identifies a particular resource, it should continue indefinitely to refer to that resource; this is called URI persistence. ]] OK, but (a) this is highly controversial. In fact I think there are many cases where there are NOT such strong social expectations, in spite of the W3C's obvious desire that there should be; (b) there is an ambiguity here since a "resource" may have a state and emit changing representations [REST]. How does one distinguish a change in resources from a changing resource? Are there guidelines to make the distinction clear? For example, I often write documents which are publicly viewable in draft, and are constantly being changed, at the same URL. By strict W3C guidelines, I gather this is bad practice. But if I consider 'the paper' to be a dynamic resource, and my edits to it to be updatings or changes of its state, why would this not be acceptable? If the reply is that I can choose either way to describe this activity, but that it is kosher under one description but bad practice under a different description, then the 'strong social expectations' seem to amount to little more than a choice of words. Is this really all that is being said here? 4. [[In principle, all data can be represented using textual formats.]] Well, yes, but the same could be said about binary data formats. So? 4.2.4 [[Many modern data format specifications include mechanisms for composition.....Note however, that for general XML there is no semantic model that defines the interactions within XML documents... ]] This reads like a critique of the design of XML. Is that reading intended? ------ Sorry this is so long, and so late. Pat Hayes -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32501 (850)291 0667 cell phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Wednesday, 17 March 2004 17:38:59 UTC