comments on Web Architecture First Edition

The following are some personal comments on 
http://www.w3.org/TR/2003/WD-webarch-20031209/
Sorry they're late.

------

1. General comment about vocabulary

The vocabulary used throughout this document can be understood in two 
rather different ways, which conflict with one another. Exactly what 
is being said is therefore not always clear, and in some cases may be 
understood by some readers to have different meanings than those 
intended. It would be helpful if the terminology used could be, if 
not defined, at least have its intended meanings clarified somewhat.

I realize that this kind of request conflicts with the requirements 
of ease of reading and general literary style, but it is nevertheless 
important; it could be done with a glossary, for example. However, in 
order to be useful, a glossary should not merely repeat sentences 
from the text using the same terminology. Thus
"Resource: An item of interest in the information space known as the 
World Wide Web" is completely uninformative since the definition 
repeats the words used in the text and hence does not resolve their 
ambiguity or provide any other way to grasp their intended meaning.

The specific ambiguity revolves around a group of terms (semantic, 
represent, identify, refer, about, meaning, resource) which can be 
understood in two rather different ways, which I will refer to as (C) 
and (D).

(C) as in a programming language, where an identifier serves to 
uniquely locate (relative to the current computational state of a 
virtual machine or a network) some piece of data. Approximate 
synonyms for 'identifier' in this sense include 'link', 'address' and 
'pointer'; and ideas like hash coding and database key are also 
connected with this sense of 'identify'.
The corresponding usage of 'representation' is where one speaks of a 
representation of data, or of the state of a computational entity.
The corresponding usage of 'resource' is something that is, or can in 
principle be, identified in this sense: a computational entity (or 
the state of it) which is accessible via some network link or 
transfer protocol.
The corresponding usage of 'semantic' language is closely analogous 
to the way this terminology is typically used in describing the 
semantics of computational systems.

(D) as in a descriptive language, such as English or formal logical 
languages, where "identifier" is synonymous with 'name', and to 
identify means simply to refer to, name or denote.
The corresponding sense of 'representation' is 'description' (or 
possibly 'formal description'), in the sense used in KR work, AI and 
formal linguistics.
The corresponding sense of 'resource' would be simply 'entity' or 
'thing', ie the word used in this way has no special Web or 
Internet-related meaning and is simply a synonym for 'entity' in the 
philosophical sense: anything that can be referred to, ie anything.
The corresponding usage of the 'semantic' language is more analogous 
to the way that this terminology is used in linguistics, philosophy, 
logical semantics and AI/KR work.

Although these two readings are obviously closely related, and in 
some circumstances can be conflated, for example when discussing the 
formal semantics of a programming language, they are not the same. It 
is important to keep them distinct, especially when discussing 
referring formalisms (such as RDF and OWL) based on (D) ideas but 
deployed on a computationally defined network normally described 
using (C) terminology, it is necessary to carefully distinguish them.

In particular, in sense (C), but not in sense (D), there is a 
presumption of a computable or effective process which can be applied 
to the identifier to provide access to the entity identified; an 
assumption (which follows from the previous) that the identification 
must be unique; and an understanding that this process might depend 
on the state of some computational system. None of these is 
assumptions is generally plausible for sense (D).

On the other hand, formal analyses of sense (D) generally understand 
reference to be relative to an interpretation, and discuss meaning in 
terms of constraints on, and relationships between, interpretations. 
This style of analysis, and the terminology associated with it, has 
been a standard in formal semantics - logical semantics, formal 
linguistics and formal philosophy - for over half a century. The 
notion of interpretation involved has no particular connection with 
the sense of computational state underlying sense (C).  Even when 
uniqueness of reference is required within an interpretation, 
guaranteeing uniqueness across all possible interpretations is 
usually meaningless or provably impossible.

The document often seems to slip between these two senses, in ways 
that suggest inappropriate conclusions. Several of the principles 
stated seem appropriate for sense (C) but are inappropriate, and in 
some cases positively harmful, if understood in sense (D). (Details 
below.) I would therefore ask that the authors clarify their intended 
meaning before publication.

(Meta-comment:  In making similar comments on similar documents in 
the past, I have found that any attempt to ask for clarification on 
this point is met with resistance on the grounds that the intended 
meaning is obvious, and have been advised to consult an English 
dictionary. Leaving aside the potentially insulting nature of such a 
response, the key point is that the terminology used here is being 
used in technical senses, rather than the informal English senses; 
and moreover, much of this terminology already has technical senses 
which are well-established in disciplines which some readers of this 
document work within, and which are relevant to emerging Web 
technology. If words are being used here in ways which conflict with 
these established technical usages, therefore, it is important to 
make at least these aspects of the intent clear. For example, the 
semi-technical use of the term "resource"  is unknown in the English 
language generally and even, as far as I am aware, in the general 
technical computer-science literature. It seems to be a usage special 
to the internet community.)

---------

2. Hunting down what is meant by "resource".

It is extremely difficult (for this reader) to find out what this 
word is supposed to mean, in spite of its being so central. The 
document as a whole does not seem to have a single view on the 
intended meaning, in fact. Much of the document makes sense only 
under the more limited (C) reading, but in places what it says is 
only consistent with the (D) reading. As a result, the document as a 
whole does not seem to have a coherent single reading. The rest of 
this comment is devoted to documenting this particular issue.

(It would be possible to keep these interpretations straight by a 
systematic use of terminology. For example, one might use "resource" 
in the second D sense (unconventionally, but consistently) together 
with "refer", and "web resource" (or "network resource" ?) in the 
first C sense together with "identify", with an understanding that 
the second usage is intended to be a special case of the former, so 
that any maxims which apply to the first, broader, sense also apply 
to the latter, but not necessarily the reverse.)

The latter (D) interpretation seems to be insisted upon by the cited 
document http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html
which reads:
"Resource
     Anything that can be named or described can be a resource. 
Familiar examples include an electronic document, an image, a service 
(e.g., "today's weather report for Los Angeles"), and a collection of 
other resources. A resource is not necessarily accessible via the 
Internet; e.g., human beings, corporations, and bound books in a 
library can also be resources. Likewise, abstract concepts can be 
resources, such as the operators and operands of a mathematical 
equation or the types of a relationship (e.g., "parent" or 
"employee"). "

Which could be paraphrased as "A resource can be anything, and 
everything is a resource". I note particularly the phrasing "named or 
described". (I also note in passing that the first three "familiar" 
examples are hardly typical of entities in general, and that the 
examples do not include such things as galaxies, atoms, grains of 
sand; kinds of material such as steel or wood; holes, times, 
locations, intervals; natural processes such as flows and movements; 
and many other categories of entity which have been the subject of 
formal ontological descriptions. Are these omissions deliberate?)

The only example given in the document is disturbingly vague at 
precisely this critical point: the resource is the "Oaxaca Weather 
Report". But what KIND of thing is that, and how exactly is it 
related to the URI and the "representation" of it?  (see later for 
more on that word)

Several different answers are consistent with what you say about the example.

(a) Do you mean something like an abstraction of a document, in the 
sense that "Moby Dick" refers to a resource called a novel, which is 
an abstraction of all the printed, spoken etc. tokens of Moby Dick 
ever produced (which could be described as "representations" of it, 
although "token" is the existing technical term in wide use here.)

(b) Do you mean that the resource here is the actual weather - the 
state of the atmosphere - in Oaxacala on the day in question? So that 
the HTML 'represents' this in the sense of talking about it - 
referring to it, describing it - which is the usual way that 
"represent" is used in normal language, formal semantics and 
linguistics.

(c) Do you mean that the resource here is the thing on the server 
that processes the request and which emits the text/html 
representation, which is therefore a representation of the state of a 
computational entity which is physically attached to the network? 
That is, the resource is a computational entity of some kind, or its 
state? This would be consistent with the first C sense of 'identify' 
and with the description in the first sentence of the abstract 
referring to 'resources interconnected by links'.

(d) Or do you intend to be systematically ambiguous between these 
alternatives, so as to try to apply to them all? I hope not, because 
they are not mutually compatible; and if not, it would be extremely 
helpful if you could clarify your intended meaning, perhaps by 
fleshing out the description of the example with a little more 
conceptual detail.

Trying to home in on your intended meaning by searching the document 
for uses of "resource" gives the following:

[[The World Wide Web is a network-spanning information space of 
resources interconnected by links. ]]

I take it then that a resource is something that can be connected by 
a link to another resource. I presume also that "link" here means 
more than simply a reference to something, but connotes an actual 
connection of some kind (eg along which information can be 
transmitted.) This seems like sense (C), and is not intelligible when 
applied in any broader sense.

[[The World Wide Web (WWW, or simply Web) is an information space in 
which the items of interest, referred to as resources, are identified 
by global identifiers called Uniform Resource Identifiers (URIs).]]
[[Each resource is identified by a URI.]] **

So resources are items of interest (of interest to who?) which are 
identified by URIs.  Unfortunately, this runs into the ambiguity 
already noted in the meaning of "identify" so does not help decide 
between senses (C) and (D)

(Already there is a tension in meaning: are the only items of 
interest those that can be connected with links? Surely not.)

[[Web agents communicate information about the state of a resource....
cf also many later references, such as .... Representation data, 
electronic data about resource state....
....]]

So resources have states, and information about those states is 
communicated. Again seems like sense (C), since many entities that 
can be named do not have states (eg numbers, arithmetic operators).

[[A URI must be assigned to a resource in order for agents to be able 
to refer to the resource]]

This seems to rule out any notion of reference by description. This 
makes sense for interpretation (C), but for interpretation (D) is a 
very strong prohibition, and if followed in most communication 
scenarios would render effective communication impossible. OWL 
expressions for example may describe classes by restrictions on 
properties; such classes are the same kind of entities that OWL uses 
URIs to refer to, but are not themselves identified or referred to by 
any URI; nevertheless they are referred to (in sense (D), i.e. 
denoted) by the OWL expressions, and such references are in fact the 
most typical form of reference to classes used in OWL reasoning. If 
'refer' in this quote is understood in sense (D), therefore, the 
claim seems to be wrong even on the WWW.

I take it therefore that this is intended in sense (C), where 'refer' 
means 'link to', rather than interpretation (D)

[[Resources exist before URIs; a resource may be identified by zero URIs]]

This seems to directly contradict the third quotation above (**). 
Which is correct?

[[A resource owner SHOULD assign a URI to each resource that others 
will expect to refer to.]] (Principle)

I take it then that resources have owners who are capable of 
assigning URIs so as to refer to the resources. (This is the first 
place in the document where this idea of ownership is mentioned, and 
no explanation is given. Later, 'owners' of URIs, rather than of 
resources, are discussed: what is the relationship between these?).

This also seems completely inappropriate in sense (D). Obviously, 
being able to refer to something does not connote ownership of it. 
Most entities that are referred to by names or descriptions in 
language have no owners.

I note that the reference to Engelbart 90 yields the following:

"in principle, every object that someone might validly want/need to 
cite should have an unambiguous address (capable of being portrayed 
in a manner as to be human readable and interpretable). (E.g., not 
acceptable to be unable to link to an object within a "frame" or 
"card.")"

which seems to clearly indicate (by its use of "address" and "link", 
and the presumption that objects are parts of computationally defined 
text rather than arbitrary referents) that Engelbart was talking in 
sense (C); which is in any case obvious from reading Engelbart.

[[The English statement "'http://www.example.com/moby' identifies 
'Moby Dick'" is ambiguous because one could understand the phrase 
"Moby Dick" to refer to distinct resources: a particular printing of 
this work, or the work itself in an abstract sense, or the fictional 
white whale, or a particular copy of the book on the shelves of a 
library (via the Web interface of the library's online catalog), or 
the record in the library's electronic catalog which contains the 
metadata about the work, or the Gutenberg project's online version.]]

Here some actual examples are given of resources, which seem to 
clearly rule out any attempt to understand the term in the 
computational sense (C). Obviously a copy of a book on a shelf, a 
fictional white whale, or a novel in an abstract sense, are not the 
kind of entities that can be connected by a link to anything else, or 
that have computational states. So the only way to understand this 
seems to be in sense (D)

I note that apart from this sentence, and the rather strange 
definition given in 
http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html , it would be 
possible to read the entire document in sense (C), so that "resource" 
meant "entity on a network" and "identify" meant "link to". Most of 
the document would make perfect sense under this narrower 
interpretation.

[[URI ambiguity arises a URI is used to identify two different Web resources.]]

"Web resource" is a new idea but is not defined or mentioned 
elsewhere. This seems to suggest a distinction between Web resources 
and other kinds of resource (??). It might be helpful if this 
distinction could be clarified and made more explicit.

In any case, section 2.3.1 seems to be important, and depends 
crucially on the meaning of this distinction, which should therefore 
be clarified.

[[....unsafe interactions may cause a change to the state of a 
resource and the user may be held responsible for the consequences of 
these interactions. ]]

This seems to suggest that users can cause changes to the state of a 
resource. Again, this makes sense on view (C) but reads very 
strangely under the (D) interpretation, since one would not normally 
expect that a reference to an entity would enable any interaction to 
take place with the entity at all. (This sentence refers to Julius 
Caesar, but gives me no power to *do* anything to him.)

[[Emerging Semantic Web technologies, including the "Web Ontology 
Language (OWL)" [OWL10], define RDF [RDF10] properties such as sameAs 
to assert that two URIs identify the same resource or 
functionalProperty to imply it.]]

This is an explicit use of a sense which is formally defined to be 
sense (D), so apparently requires that we interpret the document in 
sense (D).

------------

Sense D is arguably the most general and all-inclusive sense. 
Unfortunately, if we do interpret the semantic language in sense (D), 
a great deal of the document makes no sense and much of it is wrong. 
To elaborate, starting again from the beginning, and interpreting in 
sense D:

[[Each resource is identified by a URI.]]

This says that every entity has a name. This is completely false, in 
fact *provably* false; and so to propose that it should be true is 
silly. Note that even 
http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html only requires 
that a resource be describable, not nameable.

[[Parties who wish to communicate must agree upon a shared set of 
identifiers and on their meanings]]

This is not true in sense (D). It is probably impossible to 
completely agree upon meanings of words in English, for example. 
Communication does not require complete agreements upon meanings: 
such agreement could only be established by communications in any 
case.

[[A URI must be assigned to a resource in order for agents to be able 
to refer to the resource.]]

Again, completely false for sense (D): see above comments on 
reference by description in OWL.

(Aside: the following sentence is extremely muddled in its logic 
however one reads it:
"It follows that a resource should be assigned a URI if a third party 
might reasonably want to link to it, make or refute assertions about 
it, retrieve or cache a representation of it, include all or part of 
it by reference into another representation, annotate it, or perform 
other operations on it."
a. Doing something to a reference to X is not performing an operation 
on X; Doing something to a representation of X is not performing an 
operation on X.
b. Retrieving or caching a representation requires that the 
representation is accessible, not the thing represented.
c. Making assertions about something can be done without naming the 
thing referred to. For example, 'Your mother is a whore', 'my 
brother's favorite hamster died yesterday'. )

[[the resource identified by a URI does not depend on the context in 
which the URI appears.]]

In sense (D), this is at best a pious hope and cannot possibly be 
enforced. There is good reason to suppose that it is usually false, 
in any case.

[[URI ambiguity should not be confused with ambiguity in natural 
language. The English statement "'http://www.example.com/moby' 
identifies 'Moby Dick'" is ambiguous because one could understand the 
phrase "Moby Dick" to refer to distinct resources: a particular 
printing of this work, or the work itself in an abstract sense, or 
the fictional white whale, or a particular copy of the book on the 
shelves of a library (via the Web interface of the library's online 
catalog), or the record in the library's electronic catalog which 
contains the metadata about the work, or the Gutenberg project's 
online version.]]

But in sense (D), URI ambiguity is exactly like ambiguity in natural 
language, so this advice to not confuse them seems meaningless. In 
fact, in sense (D), all naming is *inherently* ambiguous, since it is 
always possible for one party to make ontological distinctions which 
were not being made by the other party. Examples from natural 
language are legion, but the same issue crops up in exchanging 
information between formal data repositories and ontologies 
("Semantic integration", "Data fusion") and has been long recognized 
as ubiquitous and inherent in the use of formal vocabularies. 
Attempts to establish exact unambiguous meanings are bound to fail, 
and to require that something essentially impossible be done before 
any communication can take place is extremely poor advice.

So this advice, and the "good practice" is in fact extremely poor 
practice if understood in sense (D).

[[URI persistence is a matter of policy and commitment on the part of 
authorities servicing URIs.... content negotiation also promotes 
consistency, as a site manager is not required to define new URIs 
when adding support for a new format specification.
... It is reasonable to limit access to a resource ...
.... The Web provides several mechanisms to control access to resources...]]

All of this language makes sense only in the (C) reading of the terminology.

--------------

3. What is a "representation" ?

This word has a usage which is current throughout linguistics, formal 
semantics, logic, philosophy, AI and cognitive science more 
generally, in which it is roughly synonymous with  'formal 
description'. The document seems to follow the REST architecture 
description in using it in a different sense, or perhaps a very 
restricted and special sense. It would be helpful if this sense could 
be made clear and stated unambiguously. (I am honestly unclear what 
the exact meaning of "representation" is in the REST architectural 
descriptions, after several attempts to get it clarified.)

Consider for example the main 'story' re-told with RDF/XML in place 
of HTML. Using the terminology from the first illustration (which is 
very good to look at, BTW), the URI identifies a resource called the 
Oaxaca Weather Report, and there is a 'representation' of that 
resource which, rendered in the way that the diagram shows the HTML, 
might instead look like this:

Metadata:
content-type rdf/xml
Data:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:weather="http://www.srh.noaa.gov/data/rdf/forecasts/#">
...
<place:oaxaca rdf:about="http://www.geog.org/coord/3234944LO-1151811">
<weather:timeAt> ... </weather:timeAt>
<weather:forecastType> "cloudyvariable" </weather:forecastType>
....
</place:oaxaca>
....

Following the usage of "representation" in the document, this is a 
representation of a weather *report*. However, following the common 
usage of "representation" mentioned above - the D sense -  this is a 
representation (in RDF) of the actual weather in Oaxaca.  That is, it 
describes a state of the actual atmosphere above a part of the 
earth's surface during a certain period: that is what it 
*represents*.  In this sense, RDF (or RDFS or OWL) is understood as a 
formal syntax which expresses propositions about the world, which 
describes the world.  This latter sense of "represent" is what the 
formal RDF semantics talks about, for example, and it is what the RDF 
primer means when it refers to "RDF/XML *describing* Eric Miller" (my 
emphasis) in the caption to its first example: 
http://www.w3.org/TR/2003/WD-rdf-primer-20031010/#example1

Call these senses respectively the (C) and (D) senses. To see the 
difference, consider what it means for the representation to be in 
error. In the first C sense, it means that the RDF/XML does not 
accurately mirror the state of the weather REPORT, and presumably 
reflects some transmission or protocol error on the network, and has 
nothing to do with the weather. In the second D sense it means that 
the weather report is inaccurate, which may well have nothing to do 
with the network at all. In the first case, you phone the people in 
charge of the network; in the second, you phone the weather 
forecasters.

Using the second D sense, it is precisely the ability to represent - 
that is, to describe - things that are not linked or connected to a 
network that makes the Web so useful: it is able to communicate 
information about anything at all. (It is probably why Nadia used the 
Web in the first place: she is likely to be more interested in the 
actual weather in Oaxaca than in the weather report considered as an 
object. ) On this view, however, all talk of the things at the sharp 
end of a 'represents' arrow being in any way connected with any 
computational process or communication protocol is meaningless: 
simply being able to *refer* to something does not give one any kind 
of hold on it: it does not presuppose any way to compute a pathway to 
it, to access it, to link to it or to perform any operations on it. 
(This is what the document seems to mean when it says in the scenario 
description that "the resource is ABOUT the weather in Oaxaca" (my 
emphasis), which is clearly distinct from the relationship between 
the 'representation' and the resource. )

My point here is not to argue that either sense of 'represent' is 
correct, but only to ask you to make your intended sense clear, 
particularly if it differs (as it seems to) from the sense in which 
this word is widely understood.

An example of a failed attempt to clarify the meaning comes in 
section 4. The first sentence reads:

[[A data format (including XHTML, CSS, PNG, XLink, RDF/XML, and SMIL 
animation) specifies the interpretation of representation data. ]]

with a link to

[[Representation data, electronic data about resource state...]]

which seems to imply that RDF/XML is a representation of resource 
state. But in the D sense of the semantic vocabulary, used 
specifically in the RDF documentation, resources need not have a 
state (and even if they do, the referent of a URI is not required to 
be a state rather than the resource itself); and applying this to the 
example, the "resource" would have to be not the weather report, but 
the atmosphere in the vicinity of Oaxaca.


--------------

4. Detailed textual comments (in order through the document):

[[The World Wide Web (WWW, or simply Web) is an information space...]]

Please define 'information space' and the intended meanings implicit 
in the use of "in" and "on" applied to it, or use a more pedestrian 
terminology. A reasonably extensive Google search on this phrase does 
not find any sense of it which makes this sentence coherent.

[[... typical behavior of Web agents - people or software (on behalf 
of a person, entity, or process) acting on this information space]]

Should probably read "software (acting on behalf...) "

Do you really want to count people as Web agents? This seems a 
needlessly general scope, and  much of the subsequent technical 
advice and recommended practice reads oddly if applied to people. 
More on this later.

(BTW, with current technology, Nadia may not need to know a URI when 
she sees one. Eudora recognizes them for me, for example, and opens 
my browser when required.)

[[Protocols define the syntax and semantics of messages exchanged by 
agents over a network. Web agents communicate information about the 
state of a resource through the exchange of representations.]]

This seems to imply that Web agents are software, not human.

[[This scenario illustrates the three architectural bases of the 
Web.... Nadia (by clicking on a hypertext link) tells her browser to 
request... The browser sends an HTTP GET request .... the browser 
retrieves and displays...]]

This wording seems to suggest that the Web architecture is centrally 
concerned with browsing. While likely true, is this what is intended 
to be conveyed here?

The figure has arrows pointing from the URI to the resource and from 
the representation to the resource.  But the scenario describes how 
the URI can be used to access a representation FROM the resource.  It 
seems odd that there is no pathway in the diagram from the URI to the 
representation.

[[... understanding the REST model and consider the role to which of 
its principles could guide their design...]]

seems ungrammatical: role/extent ...(??)

Some of the principles in 1.1.3 read like platitudes (especially 
"good practice"). Would it be possible to give these points a little 
more substance?

1.2
[[A number of general architecture principles apply to across all 
three bases of Web architecture.]]
apply to what?
Which 'three bases' are being referred to here?

1.2.2
[[This document does not distinguish in any formal way the terms 
"format" and "language." Context has determined which term is used.]]

This is unfortunate, as the word "language" is widely used to connote 
a much more extensive set of assumptions than the term "format". The 
context of use in the document is not always sufficient to determine 
what is meant.

[[Language subset: one language is a subset (or, "profile") of a 
second language if any document in the first language is also a valid 
document in the second language and has the same interpretation in 
the second language.]]
//One language is a subset of another if any document in the first 
language is also a valid document, with the same interpretation, in 
the other language.

[[The manner in which they are dealt with depends on application context.]]
//Application context determines the manner of dealing with them.

[[User agents that correct errors without the consent of the user are 
not acting on the user's behalf..... Silent recovery from error is 
harmful.]]

Really?? I beg to differ. Surely such actions are in fact a large 
part of what we have user agents for.

At least give some reason to justify this claim, which seems quite 
arbitrary and in fact inconsistent with GUI design principles.

[[Experience with the cost of building a user agent to handle the 
diverse forms of ill-formed HTML content convinced the authors of the 
XML specification to require that agents fail deterministically upon 
encountering ill-formed content. Because users are unlikely to 
tolerate such failures, this design choice has pressured all parties 
into respecting XML's constraints, to the benefit of all.]]

There are benefits, but there are also costs. Entire development 
paths are cut off from XML applications because of the rigidity of 
the XML specs. This entire issue is more complicated than this 
naively optimistic paragraph suggests. I would suggest omitting this 
controversial claim or at least indicating that it is possible to 
rationally disagree.

2.

[[Parties who wish to communicate must agree upon a shared set of 
identifiers and on their meanings.]]

Stated this broadly this is false, and there is no need to state it 
this broadly in an architecture document. I suggest simply removing 
this sentence. (The first sentence of section 3.4 says it better: 
"Successful communication between two parties using a piece of 
information relies on shared understanding of the meaning of the 
information.")

[[The identification mechanism for the Web is the URI.]]

URIs are not mechanisms. Please rephrase this coherently.

[[A URI must be assigned to a resource in order for agents to be able 
to refer to the resource. It follows that a resource should be 
assigned a URI if a third party might reasonably want to link to it, 
make or refute assertions about it, retrieve or cache a 
representation of it, include all or part of it by reference into 
another representation, annotate it, or perform other operations on 
it.]]

As noted above, the first sentence here is false if 'refer to' is 
understood in its commonly used sense. The list of conditions in the 
second sentence are not all in the same category: to link to it 
requires a unique URI, but to make or refute assertions about it, or 
to [manipulate] a representation of it, does not. None of these are 
in any way comparable to performing operations on it, which indeed 
requires a more direct form of access to the resource itself. The 
phrase "other operations" is misleading as the previous items are not 
performance of operations on the resource.

[[.. there are many benefits to assigning a URI to a resource... A 
resource owner SHOULD assign a URI to each resource that others will 
expect to refer to.]]

Nothing has been said until this point about ownership of resources, 
or about assigning URIs to resources. Questions arise immediately: 
What counts as ownership in this context (particularly if 'resource' 
has the broad (D) interpretation)? How are URIs assigned to 
resources? (Is there a method or technique for 'assignment' in this 
sense?) Can assignment only be done by the owner of the resource 
and/or the URI?

The text should discuss this issue, if only briefly.

[[For example, the parties responsible for weather.example.com should 
not use both "http://weather.example.com/Oaxaca" and 
"http://weather.example.com/oaxaca" to refer to the same resource; 
agents will not detect the equivalence relationship by following 
specifications.]]

I do not follow this. What is the problem here? We have just been 
told that a resource may have more than one URI.  What 'equivalence 
relationship' is being referred to? Is the point that people will 
confuse these but software will not? But if they refer to the same 
resource, why does this matter? And in general, how is this entire 
discussion squared with the opacity discussion later?

[[If a URI has been assigned to a resource, agents SHOULD refer to 
the resource using the same URI, character for character.]]

Why?? This seems to be at odds with the point just made. If owners 
can assign more than one URI to a resource, why must agents use only 
one of them? If they must, what is the point of creating more than 
one?

[[ the agent has a unique relationship with the URI, called URI ownership. ]]
Does 'agent' here include software agents?

2.3
[[the ambiguous use of terms imposes a cost in communication.]]

This is a controversial claim. It can be argued that it is only 
ambiguity which makes communication possible at all, in one sense of 
'ambiguity'.  Time and email space do not permit a full comment on 
this, but I would suggest omitting or qualifying it.

[[URI ambiguity refers to the use of the same URI to refer to more 
than one distinct resource.]]

Again, this is itself, ironically enough, ambiguous. If you mean 
'refer' in the (C) sense, I would agree, since communication 
protocols require that the ambiguity be resolved. If you mean 'refer' 
in sense (D), then it is not clear that ambiguity in this sense can 
possibly be avoided, certainly not for computational systems. (This 
follows ultimately from Goedel's second incompletenesss theorem.)

[[The English statement "'http://www.example.com/moby' identifies 
'Moby Dick'" is ambiguous because one could understand the phrase 
"Moby Dick" to refer to distinct resources: a particular printing of 
this work, or the work itself in an abstract sense, or the fictional 
white whale, or a particular copy of the book on the shelves of a 
library (via the Web interface of the library's online catalog), or 
the record in the library's electronic catalog which contains the 
metadata about the work, or the Gutenberg project's online version.]]

What exactly is being said here? If these various things are indeed 
all resources, then is the claim that the ambiguity arises from the 
use of the English quoted phrase? That indeed makes sense, but then 
how exactly does the owner of that URI specify which of them is the 
intended resource which the URI uniquely identifies? There seems to 
be no way around the ambiguity inherent in general reference.

This comment seems to raise more issues than it resolves and might be 
better omitted.

[[URI ambiguity arises a URI is used to identify two different Web resources.]]
...when a URI is used...

I will try to explain in another document why referential ambiguity 
is not only not always a bad thing. Basically, you can't outlaw it, 
so why bother trying: but in addition, it in fact can be useful. Most 
English words are systematically ambiguous, because its easier to get 
reliable communication over a noisy low-bandwidth channel by 
overloading the words in ways that can be easily resolved from 
context than it is to try to invent distinct signs for all the 
possible nuances of meaning, particularly when those nuances cannot 
be computed ahead of time, in general. Most of the nuances are 
irrelevant most of the time in any case. For example, it is almost 
certainly harmless to allow a URI to be ambiguous between a person 
and a homepage, as long as one can easily distinguish homepages from 
people and map between them when required (ie you can easily coerce 
in either direction). Allowing a URI to be ambiguous between a star 
and a planet might be rather nastier, since the astronomy context 
will often not allow you to resolve a difference which might be 
important. Many issues arise: but to just give a blanket 'ambiguity 
is bad' rule is way too simplistic.

BTW, I wholeheartedly concur with 
http://lists.w3.org/Archives/Public/www-tag/2002Sep/0132

2.5
[[Agents making use of URIs MUST NOT attempt to infer properties of 
the referenced resource except as licensed by relevant 
specifications]]

Does this include human agents? I certainly do this a lot, myself, 
see nothing wrong with it, and don't propose to stop doing it.

But in any case, this seems to fly in the face of current practice, 
if I understand it correctly. When I use Google, my browser comes 
back with a display of a (representation of) something with a URI 
that looks like this:
http://www.google.com/search?as_q=pat+hayes&num=10&hl=en&ie=ISO-8859-1&btnG=Google+Search&as_epq=&as_oq=&as_eq=&lr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=&safe=off
which is absolutely chock full of information from which software can 
be said to infer properties of the referenced resource. This kind of 
thing is done all the time. It sounds like you are trying to say that 
Google MUST NOT do what it does. Frankly, this would be a very bad 
political move: Google is of far more value to the Web than the 
entire W3C.

3.1
[[Agents may use a URI to access the referenced resource]]

Is this may as in 'sometimes possible' or as in 'generally have 
permission'? Is it always possible to use a URI to access a resource? 
How can this be reconciled with the (D) definition of resource? (How 
does one access an imaginary white whale?)

In general, this entire section seems to make sense only with the 
narrow (C) reading of 'resource' to mean 'thing physically attached 
to a network". The terminology needs to be kept straight in order for 
the text to be comprehensible.

3.2
[[The Web's protocols ... are based on the exchange of messages.]]

What kinds of entity do this exchanging of messages? (Resources? Agents? Both?)

[[Agents use representations to modify as well as retrieve resource state]]

I find this puzzling. How does an agent use a REPRESENTATION to 
modify something? Representations aren't the kind of thing that DO 
anything. (??)

3.3.1
[[Interpretation of the fragment identifier during a retrieval action 
is performed solely by the agent]]

By which agent?

[[A resource owner who creates a URI with a fragment identifier and 
who uses content negotiation to serve multiple representations of the 
identified resource SHOULD NOT serve representations with 
inconsistent fragment identifier semantics]]

What sense of 'semantics' is meant here? What counts as 
'inconsistent'? In the example given, does this mean that the png and 
jpeg should be the "same picture" ? What exactly does this mean? (eg 
suppose one has a different color balance, or is a slightly different 
size: is that an inconsistency?)
(part of the issue here is that words like 'inconsistent' have tight 
technical meanings, and it is not lcear if you mean to

3.4
[[the design choice for the Web is, in general, that the owner of a 
resource assigns the authoritative interpretation of representations 
of the resource.]]

HOW?? Since this point is so central, surely some guidance should be 
given as to how to perform this miracle of referential precision. The 
example given explains how the authority decides what representations 
to send to Nadia. It says nothing about how to make sure that these 
representations uniquely refer, or how they are given an 
interpretation.

There is a deeper issue. Suppose the owner assigns an authoritative 
interpretation: how is this INTERPRETATION communicated to Nadia? 
Nothing has been said about how to communicate interpretations of 
representations over the Web.

None of this section makes sense (on either the C or D readings).

[[User agents MUST NOT silently ignore authoritative server 
metadata..... if Nadia's browser detects a problem, Nadia's browser 
must not silently ignore the problem and render the JPEG image.]]

Why not?? Again, this seems unmotivated, arbitrary and inconsistent 
with good application design in many cases. And, frankly, it doesn't 
seem like any of your business: its a user-application decision, not 
a web-architecture decision.

3.5.1
[[It is a breakdown of the Web architecture if agents cannot use URIs 
to reconstruct a "paper trail" of transactions]]

Does this apply even to safe interactions?

3.6.2
[[There are strong social expectations that once a URI identifies a 
particular resource, it should continue indefinitely to refer to that 
resource; this is called URI persistence. ]]

OK, but
(a) this is highly controversial. In fact I think there are many 
cases where there are NOT such strong social expectations, in spite 
of the W3C's obvious desire that there should be;

(b) there is an ambiguity here since a "resource" may have a state 
and emit changing representations [REST]. How does one distinguish a 
change in resources from a changing resource? Are there guidelines to 
make the distinction clear?

  For example, I often write documents which are publicly viewable in 
draft, and are constantly being changed, at the same URL. By strict 
W3C guidelines, I gather this is bad practice. But if I consider 'the 
paper' to be a dynamic resource, and my edits to it to be updatings 
or changes of its state, why would this not be acceptable?

If the reply is that I can choose either way to describe this 
activity, but that it is kosher under one description but bad 
practice under a different description, then the 'strong social 
expectations' seem to amount to little more than a choice of words. 
Is this really all that is being said here?

4.
[[In principle, all data can be represented using textual formats.]]

Well, yes, but the same could be said about binary data formats. So?

4.2.4
[[Many modern data format specifications include mechanisms for 
composition.....Note however, that for general XML there is no 
semantic model that defines the interactions within XML documents... 
]]

This reads like a critique of the design of XML. Is that reading intended?

------

Sorry this is so long, and so late.

Pat Hayes












-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes

Received on Wednesday, 17 March 2004 17:38:59 UTC