- From: Pat Hayes <phayes@ihmc.us>
- Date: Wed, 17 Mar 2004 16:38:52 -0600
- To: public-webarch-comments@w3.org
- Cc: w3c-rdfcore-wg@w3.org
- Message-Id: <p06001f10bc779f2e9ec4@[10.0.100.76]>
The following are some personal comments on
http://www.w3.org/TR/2003/WD-webarch-20031209/
Sorry they're late.
------
1. General comment about vocabulary
The vocabulary used throughout this document can be understood in two
rather different ways, which conflict with one another. Exactly what
is being said is therefore not always clear, and in some cases may be
understood by some readers to have different meanings than those
intended. It would be helpful if the terminology used could be, if
not defined, at least have its intended meanings clarified somewhat.
I realize that this kind of request conflicts with the requirements
of ease of reading and general literary style, but it is nevertheless
important; it could be done with a glossary, for example. However, in
order to be useful, a glossary should not merely repeat sentences
from the text using the same terminology. Thus
"Resource: An item of interest in the information space known as the
World Wide Web" is completely uninformative since the definition
repeats the words used in the text and hence does not resolve their
ambiguity or provide any other way to grasp their intended meaning.
The specific ambiguity revolves around a group of terms (semantic,
represent, identify, refer, about, meaning, resource) which can be
understood in two rather different ways, which I will refer to as (C)
and (D).
(C) as in a programming language, where an identifier serves to
uniquely locate (relative to the current computational state of a
virtual machine or a network) some piece of data. Approximate
synonyms for 'identifier' in this sense include 'link', 'address' and
'pointer'; and ideas like hash coding and database key are also
connected with this sense of 'identify'.
The corresponding usage of 'representation' is where one speaks of a
representation of data, or of the state of a computational entity.
The corresponding usage of 'resource' is something that is, or can in
principle be, identified in this sense: a computational entity (or
the state of it) which is accessible via some network link or
transfer protocol.
The corresponding usage of 'semantic' language is closely analogous
to the way this terminology is typically used in describing the
semantics of computational systems.
(D) as in a descriptive language, such as English or formal logical
languages, where "identifier" is synonymous with 'name', and to
identify means simply to refer to, name or denote.
The corresponding sense of 'representation' is 'description' (or
possibly 'formal description'), in the sense used in KR work, AI and
formal linguistics.
The corresponding sense of 'resource' would be simply 'entity' or
'thing', ie the word used in this way has no special Web or
Internet-related meaning and is simply a synonym for 'entity' in the
philosophical sense: anything that can be referred to, ie anything.
The corresponding usage of the 'semantic' language is more analogous
to the way that this terminology is used in linguistics, philosophy,
logical semantics and AI/KR work.
Although these two readings are obviously closely related, and in
some circumstances can be conflated, for example when discussing the
formal semantics of a programming language, they are not the same. It
is important to keep them distinct, especially when discussing
referring formalisms (such as RDF and OWL) based on (D) ideas but
deployed on a computationally defined network normally described
using (C) terminology, it is necessary to carefully distinguish them.
In particular, in sense (C), but not in sense (D), there is a
presumption of a computable or effective process which can be applied
to the identifier to provide access to the entity identified; an
assumption (which follows from the previous) that the identification
must be unique; and an understanding that this process might depend
on the state of some computational system. None of these is
assumptions is generally plausible for sense (D).
On the other hand, formal analyses of sense (D) generally understand
reference to be relative to an interpretation, and discuss meaning in
terms of constraints on, and relationships between, interpretations.
This style of analysis, and the terminology associated with it, has
been a standard in formal semantics - logical semantics, formal
linguistics and formal philosophy - for over half a century. The
notion of interpretation involved has no particular connection with
the sense of computational state underlying sense (C). Even when
uniqueness of reference is required within an interpretation,
guaranteeing uniqueness across all possible interpretations is
usually meaningless or provably impossible.
The document often seems to slip between these two senses, in ways
that suggest inappropriate conclusions. Several of the principles
stated seem appropriate for sense (C) but are inappropriate, and in
some cases positively harmful, if understood in sense (D). (Details
below.) I would therefore ask that the authors clarify their intended
meaning before publication.
(Meta-comment: In making similar comments on similar documents in
the past, I have found that any attempt to ask for clarification on
this point is met with resistance on the grounds that the intended
meaning is obvious, and have been advised to consult an English
dictionary. Leaving aside the potentially insulting nature of such a
response, the key point is that the terminology used here is being
used in technical senses, rather than the informal English senses;
and moreover, much of this terminology already has technical senses
which are well-established in disciplines which some readers of this
document work within, and which are relevant to emerging Web
technology. If words are being used here in ways which conflict with
these established technical usages, therefore, it is important to
make at least these aspects of the intent clear. For example, the
semi-technical use of the term "resource" is unknown in the English
language generally and even, as far as I am aware, in the general
technical computer-science literature. It seems to be a usage special
to the internet community.)
---------
2. Hunting down what is meant by "resource".
It is extremely difficult (for this reader) to find out what this
word is supposed to mean, in spite of its being so central. The
document as a whole does not seem to have a single view on the
intended meaning, in fact. Much of the document makes sense only
under the more limited (C) reading, but in places what it says is
only consistent with the (D) reading. As a result, the document as a
whole does not seem to have a coherent single reading. The rest of
this comment is devoted to documenting this particular issue.
(It would be possible to keep these interpretations straight by a
systematic use of terminology. For example, one might use "resource"
in the second D sense (unconventionally, but consistently) together
with "refer", and "web resource" (or "network resource" ?) in the
first C sense together with "identify", with an understanding that
the second usage is intended to be a special case of the former, so
that any maxims which apply to the first, broader, sense also apply
to the latter, but not necessarily the reverse.)
The latter (D) interpretation seems to be insisted upon by the cited
document http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html
which reads:
"Resource
Anything that can be named or described can be a resource.
Familiar examples include an electronic document, an image, a service
(e.g., "today's weather report for Los Angeles"), and a collection of
other resources. A resource is not necessarily accessible via the
Internet; e.g., human beings, corporations, and bound books in a
library can also be resources. Likewise, abstract concepts can be
resources, such as the operators and operands of a mathematical
equation or the types of a relationship (e.g., "parent" or
"employee"). "
Which could be paraphrased as "A resource can be anything, and
everything is a resource". I note particularly the phrasing "named or
described". (I also note in passing that the first three "familiar"
examples are hardly typical of entities in general, and that the
examples do not include such things as galaxies, atoms, grains of
sand; kinds of material such as steel or wood; holes, times,
locations, intervals; natural processes such as flows and movements;
and many other categories of entity which have been the subject of
formal ontological descriptions. Are these omissions deliberate?)
The only example given in the document is disturbingly vague at
precisely this critical point: the resource is the "Oaxaca Weather
Report". But what KIND of thing is that, and how exactly is it
related to the URI and the "representation" of it? (see later for
more on that word)
Several different answers are consistent with what you say about the example.
(a) Do you mean something like an abstraction of a document, in the
sense that "Moby Dick" refers to a resource called a novel, which is
an abstraction of all the printed, spoken etc. tokens of Moby Dick
ever produced (which could be described as "representations" of it,
although "token" is the existing technical term in wide use here.)
(b) Do you mean that the resource here is the actual weather - the
state of the atmosphere - in Oaxacala on the day in question? So that
the HTML 'represents' this in the sense of talking about it -
referring to it, describing it - which is the usual way that
"represent" is used in normal language, formal semantics and
linguistics.
(c) Do you mean that the resource here is the thing on the server
that processes the request and which emits the text/html
representation, which is therefore a representation of the state of a
computational entity which is physically attached to the network?
That is, the resource is a computational entity of some kind, or its
state? This would be consistent with the first C sense of 'identify'
and with the description in the first sentence of the abstract
referring to 'resources interconnected by links'.
(d) Or do you intend to be systematically ambiguous between these
alternatives, so as to try to apply to them all? I hope not, because
they are not mutually compatible; and if not, it would be extremely
helpful if you could clarify your intended meaning, perhaps by
fleshing out the description of the example with a little more
conceptual detail.
Trying to home in on your intended meaning by searching the document
for uses of "resource" gives the following:
[[The World Wide Web is a network-spanning information space of
resources interconnected by links. ]]
I take it then that a resource is something that can be connected by
a link to another resource. I presume also that "link" here means
more than simply a reference to something, but connotes an actual
connection of some kind (eg along which information can be
transmitted.) This seems like sense (C), and is not intelligible when
applied in any broader sense.
[[The World Wide Web (WWW, or simply Web) is an information space in
which the items of interest, referred to as resources, are identified
by global identifiers called Uniform Resource Identifiers (URIs).]]
[[Each resource is identified by a URI.]] **
So resources are items of interest (of interest to who?) which are
identified by URIs. Unfortunately, this runs into the ambiguity
already noted in the meaning of "identify" so does not help decide
between senses (C) and (D)
(Already there is a tension in meaning: are the only items of
interest those that can be connected with links? Surely not.)
[[Web agents communicate information about the state of a resource....
cf also many later references, such as .... Representation data,
electronic data about resource state....
....]]
So resources have states, and information about those states is
communicated. Again seems like sense (C), since many entities that
can be named do not have states (eg numbers, arithmetic operators).
[[A URI must be assigned to a resource in order for agents to be able
to refer to the resource]]
This seems to rule out any notion of reference by description. This
makes sense for interpretation (C), but for interpretation (D) is a
very strong prohibition, and if followed in most communication
scenarios would render effective communication impossible. OWL
expressions for example may describe classes by restrictions on
properties; such classes are the same kind of entities that OWL uses
URIs to refer to, but are not themselves identified or referred to by
any URI; nevertheless they are referred to (in sense (D), i.e.
denoted) by the OWL expressions, and such references are in fact the
most typical form of reference to classes used in OWL reasoning. If
'refer' in this quote is understood in sense (D), therefore, the
claim seems to be wrong even on the WWW.
I take it therefore that this is intended in sense (C), where 'refer'
means 'link to', rather than interpretation (D)
[[Resources exist before URIs; a resource may be identified by zero URIs]]
This seems to directly contradict the third quotation above (**).
Which is correct?
[[A resource owner SHOULD assign a URI to each resource that others
will expect to refer to.]] (Principle)
I take it then that resources have owners who are capable of
assigning URIs so as to refer to the resources. (This is the first
place in the document where this idea of ownership is mentioned, and
no explanation is given. Later, 'owners' of URIs, rather than of
resources, are discussed: what is the relationship between these?).
This also seems completely inappropriate in sense (D). Obviously,
being able to refer to something does not connote ownership of it.
Most entities that are referred to by names or descriptions in
language have no owners.
I note that the reference to Engelbart 90 yields the following:
"in principle, every object that someone might validly want/need to
cite should have an unambiguous address (capable of being portrayed
in a manner as to be human readable and interpretable). (E.g., not
acceptable to be unable to link to an object within a "frame" or
"card.")"
which seems to clearly indicate (by its use of "address" and "link",
and the presumption that objects are parts of computationally defined
text rather than arbitrary referents) that Engelbart was talking in
sense (C); which is in any case obvious from reading Engelbart.
[[The English statement "'http://www.example.com/moby' identifies
'Moby Dick'" is ambiguous because one could understand the phrase
"Moby Dick" to refer to distinct resources: a particular printing of
this work, or the work itself in an abstract sense, or the fictional
white whale, or a particular copy of the book on the shelves of a
library (via the Web interface of the library's online catalog), or
the record in the library's electronic catalog which contains the
metadata about the work, or the Gutenberg project's online version.]]
Here some actual examples are given of resources, which seem to
clearly rule out any attempt to understand the term in the
computational sense (C). Obviously a copy of a book on a shelf, a
fictional white whale, or a novel in an abstract sense, are not the
kind of entities that can be connected by a link to anything else, or
that have computational states. So the only way to understand this
seems to be in sense (D)
I note that apart from this sentence, and the rather strange
definition given in
http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html , it would be
possible to read the entire document in sense (C), so that "resource"
meant "entity on a network" and "identify" meant "link to". Most of
the document would make perfect sense under this narrower
interpretation.
[[URI ambiguity arises a URI is used to identify two different Web resources.]]
"Web resource" is a new idea but is not defined or mentioned
elsewhere. This seems to suggest a distinction between Web resources
and other kinds of resource (??). It might be helpful if this
distinction could be clarified and made more explicit.
In any case, section 2.3.1 seems to be important, and depends
crucially on the meaning of this distinction, which should therefore
be clarified.
[[....unsafe interactions may cause a change to the state of a
resource and the user may be held responsible for the consequences of
these interactions. ]]
This seems to suggest that users can cause changes to the state of a
resource. Again, this makes sense on view (C) but reads very
strangely under the (D) interpretation, since one would not normally
expect that a reference to an entity would enable any interaction to
take place with the entity at all. (This sentence refers to Julius
Caesar, but gives me no power to *do* anything to him.)
[[Emerging Semantic Web technologies, including the "Web Ontology
Language (OWL)" [OWL10], define RDF [RDF10] properties such as sameAs
to assert that two URIs identify the same resource or
functionalProperty to imply it.]]
This is an explicit use of a sense which is formally defined to be
sense (D), so apparently requires that we interpret the document in
sense (D).
------------
Sense D is arguably the most general and all-inclusive sense.
Unfortunately, if we do interpret the semantic language in sense (D),
a great deal of the document makes no sense and much of it is wrong.
To elaborate, starting again from the beginning, and interpreting in
sense D:
[[Each resource is identified by a URI.]]
This says that every entity has a name. This is completely false, in
fact *provably* false; and so to propose that it should be true is
silly. Note that even
http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html only requires
that a resource be describable, not nameable.
[[Parties who wish to communicate must agree upon a shared set of
identifiers and on their meanings]]
This is not true in sense (D). It is probably impossible to
completely agree upon meanings of words in English, for example.
Communication does not require complete agreements upon meanings:
such agreement could only be established by communications in any
case.
[[A URI must be assigned to a resource in order for agents to be able
to refer to the resource.]]
Again, completely false for sense (D): see above comments on
reference by description in OWL.
(Aside: the following sentence is extremely muddled in its logic
however one reads it:
"It follows that a resource should be assigned a URI if a third party
might reasonably want to link to it, make or refute assertions about
it, retrieve or cache a representation of it, include all or part of
it by reference into another representation, annotate it, or perform
other operations on it."
a. Doing something to a reference to X is not performing an operation
on X; Doing something to a representation of X is not performing an
operation on X.
b. Retrieving or caching a representation requires that the
representation is accessible, not the thing represented.
c. Making assertions about something can be done without naming the
thing referred to. For example, 'Your mother is a whore', 'my
brother's favorite hamster died yesterday'. )
[[the resource identified by a URI does not depend on the context in
which the URI appears.]]
In sense (D), this is at best a pious hope and cannot possibly be
enforced. There is good reason to suppose that it is usually false,
in any case.
[[URI ambiguity should not be confused with ambiguity in natural
language. The English statement "'http://www.example.com/moby'
identifies 'Moby Dick'" is ambiguous because one could understand the
phrase "Moby Dick" to refer to distinct resources: a particular
printing of this work, or the work itself in an abstract sense, or
the fictional white whale, or a particular copy of the book on the
shelves of a library (via the Web interface of the library's online
catalog), or the record in the library's electronic catalog which
contains the metadata about the work, or the Gutenberg project's
online version.]]
But in sense (D), URI ambiguity is exactly like ambiguity in natural
language, so this advice to not confuse them seems meaningless. In
fact, in sense (D), all naming is *inherently* ambiguous, since it is
always possible for one party to make ontological distinctions which
were not being made by the other party. Examples from natural
language are legion, but the same issue crops up in exchanging
information between formal data repositories and ontologies
("Semantic integration", "Data fusion") and has been long recognized
as ubiquitous and inherent in the use of formal vocabularies.
Attempts to establish exact unambiguous meanings are bound to fail,
and to require that something essentially impossible be done before
any communication can take place is extremely poor advice.
So this advice, and the "good practice" is in fact extremely poor
practice if understood in sense (D).
[[URI persistence is a matter of policy and commitment on the part of
authorities servicing URIs.... content negotiation also promotes
consistency, as a site manager is not required to define new URIs
when adding support for a new format specification.
... It is reasonable to limit access to a resource ...
.... The Web provides several mechanisms to control access to resources...]]
All of this language makes sense only in the (C) reading of the terminology.
--------------
3. What is a "representation" ?
This word has a usage which is current throughout linguistics, formal
semantics, logic, philosophy, AI and cognitive science more
generally, in which it is roughly synonymous with 'formal
description'. The document seems to follow the REST architecture
description in using it in a different sense, or perhaps a very
restricted and special sense. It would be helpful if this sense could
be made clear and stated unambiguously. (I am honestly unclear what
the exact meaning of "representation" is in the REST architectural
descriptions, after several attempts to get it clarified.)
Consider for example the main 'story' re-told with RDF/XML in place
of HTML. Using the terminology from the first illustration (which is
very good to look at, BTW), the URI identifies a resource called the
Oaxaca Weather Report, and there is a 'representation' of that
resource which, rendered in the way that the diagram shows the HTML,
might instead look like this:
Metadata:
content-type rdf/xml
Data:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:weather="http://www.srh.noaa.gov/data/rdf/forecasts/#">
...
<place:oaxaca rdf:about="http://www.geog.org/coord/3234944LO-1151811">
<weather:timeAt> ... </weather:timeAt>
<weather:forecastType> "cloudyvariable" </weather:forecastType>
....
</place:oaxaca>
....
Following the usage of "representation" in the document, this is a
representation of a weather *report*. However, following the common
usage of "representation" mentioned above - the D sense - this is a
representation (in RDF) of the actual weather in Oaxaca. That is, it
describes a state of the actual atmosphere above a part of the
earth's surface during a certain period: that is what it
*represents*. In this sense, RDF (or RDFS or OWL) is understood as a
formal syntax which expresses propositions about the world, which
describes the world. This latter sense of "represent" is what the
formal RDF semantics talks about, for example, and it is what the RDF
primer means when it refers to "RDF/XML *describing* Eric Miller" (my
emphasis) in the caption to its first example:
http://www.w3.org/TR/2003/WD-rdf-primer-20031010/#example1
Call these senses respectively the (C) and (D) senses. To see the
difference, consider what it means for the representation to be in
error. In the first C sense, it means that the RDF/XML does not
accurately mirror the state of the weather REPORT, and presumably
reflects some transmission or protocol error on the network, and has
nothing to do with the weather. In the second D sense it means that
the weather report is inaccurate, which may well have nothing to do
with the network at all. In the first case, you phone the people in
charge of the network; in the second, you phone the weather
forecasters.
Using the second D sense, it is precisely the ability to represent -
that is, to describe - things that are not linked or connected to a
network that makes the Web so useful: it is able to communicate
information about anything at all. (It is probably why Nadia used the
Web in the first place: she is likely to be more interested in the
actual weather in Oaxaca than in the weather report considered as an
object. ) On this view, however, all talk of the things at the sharp
end of a 'represents' arrow being in any way connected with any
computational process or communication protocol is meaningless:
simply being able to *refer* to something does not give one any kind
of hold on it: it does not presuppose any way to compute a pathway to
it, to access it, to link to it or to perform any operations on it.
(This is what the document seems to mean when it says in the scenario
description that "the resource is ABOUT the weather in Oaxaca" (my
emphasis), which is clearly distinct from the relationship between
the 'representation' and the resource. )
My point here is not to argue that either sense of 'represent' is
correct, but only to ask you to make your intended sense clear,
particularly if it differs (as it seems to) from the sense in which
this word is widely understood.
An example of a failed attempt to clarify the meaning comes in
section 4. The first sentence reads:
[[A data format (including XHTML, CSS, PNG, XLink, RDF/XML, and SMIL
animation) specifies the interpretation of representation data. ]]
with a link to
[[Representation data, electronic data about resource state...]]
which seems to imply that RDF/XML is a representation of resource
state. But in the D sense of the semantic vocabulary, used
specifically in the RDF documentation, resources need not have a
state (and even if they do, the referent of a URI is not required to
be a state rather than the resource itself); and applying this to the
example, the "resource" would have to be not the weather report, but
the atmosphere in the vicinity of Oaxaca.
--------------
4. Detailed textual comments (in order through the document):
[[The World Wide Web (WWW, or simply Web) is an information space...]]
Please define 'information space' and the intended meanings implicit
in the use of "in" and "on" applied to it, or use a more pedestrian
terminology. A reasonably extensive Google search on this phrase does
not find any sense of it which makes this sentence coherent.
[[... typical behavior of Web agents - people or software (on behalf
of a person, entity, or process) acting on this information space]]
Should probably read "software (acting on behalf...) "
Do you really want to count people as Web agents? This seems a
needlessly general scope, and much of the subsequent technical
advice and recommended practice reads oddly if applied to people.
More on this later.
(BTW, with current technology, Nadia may not need to know a URI when
she sees one. Eudora recognizes them for me, for example, and opens
my browser when required.)
[[Protocols define the syntax and semantics of messages exchanged by
agents over a network. Web agents communicate information about the
state of a resource through the exchange of representations.]]
This seems to imply that Web agents are software, not human.
[[This scenario illustrates the three architectural bases of the
Web.... Nadia (by clicking on a hypertext link) tells her browser to
request... The browser sends an HTTP GET request .... the browser
retrieves and displays...]]
This wording seems to suggest that the Web architecture is centrally
concerned with browsing. While likely true, is this what is intended
to be conveyed here?
The figure has arrows pointing from the URI to the resource and from
the representation to the resource. But the scenario describes how
the URI can be used to access a representation FROM the resource. It
seems odd that there is no pathway in the diagram from the URI to the
representation.
[[... understanding the REST model and consider the role to which of
its principles could guide their design...]]
seems ungrammatical: role/extent ...(??)
Some of the principles in 1.1.3 read like platitudes (especially
"good practice"). Would it be possible to give these points a little
more substance?
1.2
[[A number of general architecture principles apply to across all
three bases of Web architecture.]]
apply to what?
Which 'three bases' are being referred to here?
1.2.2
[[This document does not distinguish in any formal way the terms
"format" and "language." Context has determined which term is used.]]
This is unfortunate, as the word "language" is widely used to connote
a much more extensive set of assumptions than the term "format". The
context of use in the document is not always sufficient to determine
what is meant.
[[Language subset: one language is a subset (or, "profile") of a
second language if any document in the first language is also a valid
document in the second language and has the same interpretation in
the second language.]]
//One language is a subset of another if any document in the first
language is also a valid document, with the same interpretation, in
the other language.
[[The manner in which they are dealt with depends on application context.]]
//Application context determines the manner of dealing with them.
[[User agents that correct errors without the consent of the user are
not acting on the user's behalf..... Silent recovery from error is
harmful.]]
Really?? I beg to differ. Surely such actions are in fact a large
part of what we have user agents for.
At least give some reason to justify this claim, which seems quite
arbitrary and in fact inconsistent with GUI design principles.
[[Experience with the cost of building a user agent to handle the
diverse forms of ill-formed HTML content convinced the authors of the
XML specification to require that agents fail deterministically upon
encountering ill-formed content. Because users are unlikely to
tolerate such failures, this design choice has pressured all parties
into respecting XML's constraints, to the benefit of all.]]
There are benefits, but there are also costs. Entire development
paths are cut off from XML applications because of the rigidity of
the XML specs. This entire issue is more complicated than this
naively optimistic paragraph suggests. I would suggest omitting this
controversial claim or at least indicating that it is possible to
rationally disagree.
2.
[[Parties who wish to communicate must agree upon a shared set of
identifiers and on their meanings.]]
Stated this broadly this is false, and there is no need to state it
this broadly in an architecture document. I suggest simply removing
this sentence. (The first sentence of section 3.4 says it better:
"Successful communication between two parties using a piece of
information relies on shared understanding of the meaning of the
information.")
[[The identification mechanism for the Web is the URI.]]
URIs are not mechanisms. Please rephrase this coherently.
[[A URI must be assigned to a resource in order for agents to be able
to refer to the resource. It follows that a resource should be
assigned a URI if a third party might reasonably want to link to it,
make or refute assertions about it, retrieve or cache a
representation of it, include all or part of it by reference into
another representation, annotate it, or perform other operations on
it.]]
As noted above, the first sentence here is false if 'refer to' is
understood in its commonly used sense. The list of conditions in the
second sentence are not all in the same category: to link to it
requires a unique URI, but to make or refute assertions about it, or
to [manipulate] a representation of it, does not. None of these are
in any way comparable to performing operations on it, which indeed
requires a more direct form of access to the resource itself. The
phrase "other operations" is misleading as the previous items are not
performance of operations on the resource.
[[.. there are many benefits to assigning a URI to a resource... A
resource owner SHOULD assign a URI to each resource that others will
expect to refer to.]]
Nothing has been said until this point about ownership of resources,
or about assigning URIs to resources. Questions arise immediately:
What counts as ownership in this context (particularly if 'resource'
has the broad (D) interpretation)? How are URIs assigned to
resources? (Is there a method or technique for 'assignment' in this
sense?) Can assignment only be done by the owner of the resource
and/or the URI?
The text should discuss this issue, if only briefly.
[[For example, the parties responsible for weather.example.com should
not use both "http://weather.example.com/Oaxaca" and
"http://weather.example.com/oaxaca" to refer to the same resource;
agents will not detect the equivalence relationship by following
specifications.]]
I do not follow this. What is the problem here? We have just been
told that a resource may have more than one URI. What 'equivalence
relationship' is being referred to? Is the point that people will
confuse these but software will not? But if they refer to the same
resource, why does this matter? And in general, how is this entire
discussion squared with the opacity discussion later?
[[If a URI has been assigned to a resource, agents SHOULD refer to
the resource using the same URI, character for character.]]
Why?? This seems to be at odds with the point just made. If owners
can assign more than one URI to a resource, why must agents use only
one of them? If they must, what is the point of creating more than
one?
[[ the agent has a unique relationship with the URI, called URI ownership. ]]
Does 'agent' here include software agents?
2.3
[[the ambiguous use of terms imposes a cost in communication.]]
This is a controversial claim. It can be argued that it is only
ambiguity which makes communication possible at all, in one sense of
'ambiguity'. Time and email space do not permit a full comment on
this, but I would suggest omitting or qualifying it.
[[URI ambiguity refers to the use of the same URI to refer to more
than one distinct resource.]]
Again, this is itself, ironically enough, ambiguous. If you mean
'refer' in the (C) sense, I would agree, since communication
protocols require that the ambiguity be resolved. If you mean 'refer'
in sense (D), then it is not clear that ambiguity in this sense can
possibly be avoided, certainly not for computational systems. (This
follows ultimately from Goedel's second incompletenesss theorem.)
[[The English statement "'http://www.example.com/moby' identifies
'Moby Dick'" is ambiguous because one could understand the phrase
"Moby Dick" to refer to distinct resources: a particular printing of
this work, or the work itself in an abstract sense, or the fictional
white whale, or a particular copy of the book on the shelves of a
library (via the Web interface of the library's online catalog), or
the record in the library's electronic catalog which contains the
metadata about the work, or the Gutenberg project's online version.]]
What exactly is being said here? If these various things are indeed
all resources, then is the claim that the ambiguity arises from the
use of the English quoted phrase? That indeed makes sense, but then
how exactly does the owner of that URI specify which of them is the
intended resource which the URI uniquely identifies? There seems to
be no way around the ambiguity inherent in general reference.
This comment seems to raise more issues than it resolves and might be
better omitted.
[[URI ambiguity arises a URI is used to identify two different Web resources.]]
...when a URI is used...
I will try to explain in another document why referential ambiguity
is not only not always a bad thing. Basically, you can't outlaw it,
so why bother trying: but in addition, it in fact can be useful. Most
English words are systematically ambiguous, because its easier to get
reliable communication over a noisy low-bandwidth channel by
overloading the words in ways that can be easily resolved from
context than it is to try to invent distinct signs for all the
possible nuances of meaning, particularly when those nuances cannot
be computed ahead of time, in general. Most of the nuances are
irrelevant most of the time in any case. For example, it is almost
certainly harmless to allow a URI to be ambiguous between a person
and a homepage, as long as one can easily distinguish homepages from
people and map between them when required (ie you can easily coerce
in either direction). Allowing a URI to be ambiguous between a star
and a planet might be rather nastier, since the astronomy context
will often not allow you to resolve a difference which might be
important. Many issues arise: but to just give a blanket 'ambiguity
is bad' rule is way too simplistic.
BTW, I wholeheartedly concur with
http://lists.w3.org/Archives/Public/www-tag/2002Sep/0132
2.5
[[Agents making use of URIs MUST NOT attempt to infer properties of
the referenced resource except as licensed by relevant
specifications]]
Does this include human agents? I certainly do this a lot, myself,
see nothing wrong with it, and don't propose to stop doing it.
But in any case, this seems to fly in the face of current practice,
if I understand it correctly. When I use Google, my browser comes
back with a display of a (representation of) something with a URI
that looks like this:
http://www.google.com/search?as_q=pat+hayes&num=10&hl=en&ie=ISO-8859-1&btnG=Google+Search&as_epq=&as_oq=&as_eq=&lr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=&safe=off
which is absolutely chock full of information from which software can
be said to infer properties of the referenced resource. This kind of
thing is done all the time. It sounds like you are trying to say that
Google MUST NOT do what it does. Frankly, this would be a very bad
political move: Google is of far more value to the Web than the
entire W3C.
3.1
[[Agents may use a URI to access the referenced resource]]
Is this may as in 'sometimes possible' or as in 'generally have
permission'? Is it always possible to use a URI to access a resource?
How can this be reconciled with the (D) definition of resource? (How
does one access an imaginary white whale?)
In general, this entire section seems to make sense only with the
narrow (C) reading of 'resource' to mean 'thing physically attached
to a network". The terminology needs to be kept straight in order for
the text to be comprehensible.
3.2
[[The Web's protocols ... are based on the exchange of messages.]]
What kinds of entity do this exchanging of messages? (Resources? Agents? Both?)
[[Agents use representations to modify as well as retrieve resource state]]
I find this puzzling. How does an agent use a REPRESENTATION to
modify something? Representations aren't the kind of thing that DO
anything. (??)
3.3.1
[[Interpretation of the fragment identifier during a retrieval action
is performed solely by the agent]]
By which agent?
[[A resource owner who creates a URI with a fragment identifier and
who uses content negotiation to serve multiple representations of the
identified resource SHOULD NOT serve representations with
inconsistent fragment identifier semantics]]
What sense of 'semantics' is meant here? What counts as
'inconsistent'? In the example given, does this mean that the png and
jpeg should be the "same picture" ? What exactly does this mean? (eg
suppose one has a different color balance, or is a slightly different
size: is that an inconsistency?)
(part of the issue here is that words like 'inconsistent' have tight
technical meanings, and it is not lcear if you mean to
3.4
[[the design choice for the Web is, in general, that the owner of a
resource assigns the authoritative interpretation of representations
of the resource.]]
HOW?? Since this point is so central, surely some guidance should be
given as to how to perform this miracle of referential precision. The
example given explains how the authority decides what representations
to send to Nadia. It says nothing about how to make sure that these
representations uniquely refer, or how they are given an
interpretation.
There is a deeper issue. Suppose the owner assigns an authoritative
interpretation: how is this INTERPRETATION communicated to Nadia?
Nothing has been said about how to communicate interpretations of
representations over the Web.
None of this section makes sense (on either the C or D readings).
[[User agents MUST NOT silently ignore authoritative server
metadata..... if Nadia's browser detects a problem, Nadia's browser
must not silently ignore the problem and render the JPEG image.]]
Why not?? Again, this seems unmotivated, arbitrary and inconsistent
with good application design in many cases. And, frankly, it doesn't
seem like any of your business: its a user-application decision, not
a web-architecture decision.
3.5.1
[[It is a breakdown of the Web architecture if agents cannot use URIs
to reconstruct a "paper trail" of transactions]]
Does this apply even to safe interactions?
3.6.2
[[There are strong social expectations that once a URI identifies a
particular resource, it should continue indefinitely to refer to that
resource; this is called URI persistence. ]]
OK, but
(a) this is highly controversial. In fact I think there are many
cases where there are NOT such strong social expectations, in spite
of the W3C's obvious desire that there should be;
(b) there is an ambiguity here since a "resource" may have a state
and emit changing representations [REST]. How does one distinguish a
change in resources from a changing resource? Are there guidelines to
make the distinction clear?
For example, I often write documents which are publicly viewable in
draft, and are constantly being changed, at the same URL. By strict
W3C guidelines, I gather this is bad practice. But if I consider 'the
paper' to be a dynamic resource, and my edits to it to be updatings
or changes of its state, why would this not be acceptable?
If the reply is that I can choose either way to describe this
activity, but that it is kosher under one description but bad
practice under a different description, then the 'strong social
expectations' seem to amount to little more than a choice of words.
Is this really all that is being said here?
4.
[[In principle, all data can be represented using textual formats.]]
Well, yes, but the same could be said about binary data formats. So?
4.2.4
[[Many modern data format specifications include mechanisms for
composition.....Note however, that for general XML there is no
semantic model that defines the interactions within XML documents...
]]
This reads like a critique of the design of XML. Is that reading intended?
------
Sorry this is so long, and so late.
Pat Hayes
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32501 (850)291 0667 cell
phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Wednesday, 17 March 2004 17:38:59 UTC