- From: Frank Manola <fmanola@acm.org>
- Date: Wed, 10 Mar 2004 17:14:03 -0500
- To: public-webarch-comments@w3.org
- Cc: w3c-rdfcore-wg@w3.org
The following are some (personal) comments on
http://www.w3.org/TR/2003/WD-webarch-20031209/
Sorry they're late.
--Frank
====
Section 1: [[A travel scenario is used throughout this document to
illustrate typical behavior of Web agents -- people or software (on
behalf of a person, entity, or process) acting on this information space.]]
This sentence defines "Web agents" as including both people and software
(as opposed to just software). However, the usage of terms like
"agent", "user agent", etc. throughout the document isn't always
consistent with including people in addition to software in the
definition (and sometimes, but not always, the phrase "software agent"
is used explicitly in places where only software is clearly meant).
Further comments will illustrate this point in specific instances.
====
[[This scenario illustrates the three architectural bases of the Web
that are discussed in this document:
1. Identification. Each resource is identified by a URI. In this
travel scenario, the resource is about the weather in Oaxaca and the URI
is "http://weather.example.com/oaxaca".]]
It would be more consistent with the rest of the example if "the
resource is about the weather in Oaxaca" read "the resource is *a
report* about the weather in Oaxaca."
Further, given the potential generality of the things URIs can identify,
it would be helpful if there were some discussion somewhere about
distinguishing between URIs identifying such distinct things as:
a. a report about the weather in Oaxaca
b. another report about the weather in Oaxaca produced by a second
organization
c. "the weather in Oaxaca" (which no one "owns", but which may be
described in multiple reports created (and owned) by multiple
independent parties)
d. data on the weather in Oaxaca reported by on-line weather
instrumentation which may be accessible to anyone, and which the
creators of (a) and (b) combine with additional information (satellite
photos, radar sweeps) to produce the reports (a) and (b).
More on this later.
====
Section 1.1: [[ * The addition a conformance section is not likely
to increase the utility of the document.
]]
The addition *of* a conformance section.
====
Section 1.1.2, first para: [[The section on Architectural
Specifications includes references.]]
This sentence seems to end abruptly. References to what?
====
Section 1.1.3: [[Authors of protocol specifications in particular
should invest time in understanding the REST model and consider the role
to which of its principles could guide their design: statelessness,
clear assignment of roles to parties, uniform address space, and a
limited, uniform set of verbs.]]
This sentence has an "interesting" structure. For one thing, "the role
to which of its principles could guide their design" seems to mix
several more usual constructions, e.g., either "the role its principles
could [or should] *play* in their designs" or "the *extent* to which
each of its principles could [or should] guide their designs". For
another, it seems as if the list of principles should follow
"principles" rather than "design", as in something like: "Authors of
protocol specifications in particular should invest time in
understanding the REST model and consider the role its principles --
statelessness, clear assignment of roles to parties, uniform address
space, and a limited, uniform set of verbs -- could play in their designs.
====
Section 1.2.1, second para: [[...The fact, for example, that the an
image can be identified...]]
"an image" rather than "the an image"
====
[[...PNG and SVG to evolve independent of...]]
independently of
====
Section 1.2.2, second last para: [[For example, from early on in the
Web, HTML agents followed the convention of ignoring unknown elements.]]
"from early on in the Web" seems a little clumsy. How about "For
example, HTML agents have historically followed..."?
====
Section 1.2.3: [[A user agent acts on behalf of the user and therefore
is expected to help the user understand the nature of errors, and
possibly overcome them. User agents that correct errors without the
consent of the user are not acting on the user's behalf.]]
Is "user agent" intended to be *any* kind of "agent" (human or software,
as previously defined) acting on behalf of someone else (the "user", so
far undefined), or just *software* that acts on behalf of a human?
Also, the text seems to equate "act on behalf of the user" with that
action necessarily being helpful, which is not necessarily the way "act
on behalf of" is always interpreted. The real point would seem to be
that user agents that correct errors in this way may in some sense be
acting on the user's behalf, but they aren't helping the user by doing it.
====
Same section, third bullet: [[* An agent that encounters unrecognized
content...]]
Given the context, this seems a bit ambiguous, since it might be taken
to refer to "user agent", as well as more generally to "agent" (assuming
these are different; are they?)
====
Section 2: First para: [[Parties who wish to communicate must agree
upon a shared set of identifiers and on their meanings.]]
"identifiers (names for things)"?
====
Second para: [[It follows that a resource should be assigned a URI if a
third party might reasonably want to link to it...]]
Why "a third party" (what are the first two parties)?
====
Third para: [[Resources exist before URIs...]]
This sounds like it should be "Resources exist independently of URIs".
====
[[Designers should expect that it will prove useful to be able to share
a URI across applications, even if that utility is not initially evident.]]
Why is the reference here to "designers"? Designers of what? This
sounds like it should instead be "resource owners", as mentioned in the
principle "URI assignment" just below.
However, there is also a related issue, which is that the terms
"resource owner", "URI owner", and "URI producer" are all used in
further discussion. I can imagine situations in which these (or at
least the first two) might have distinct meanings, but they don't
necessarily seem to be used that way. If they are distinct, it would
help to have precise definitions. If they aren't distinct, it would
help to pick one term and use it consistently (along with some further
explanation of why they are the same).
In particular, "Resource owner" and "URI owner" would appear to be
equivalent when discussing URIs that refer to resources with retrievable
representations. However, given that URIs can be created to refer to
other kinds of resources, it would seem that multiple URIs might be
created to refer to the same resource, and those URIs would have
different owners. For example, suppose I (the person Frank Manola) am
the resource. It seems to me I can reasonably claim to be the owner of
that resource (whether I have assigned myself a URI or not; recall that
resources can have zero URIs). Independently, other people may create
resources with retrievable representations (e.g., reports) that refer to
me and, perhaps not knowing the URI I have assigned to myself (even if I
*have* assigned one), can create URIs to refer to me (say, in RDF
statements). It seems to me those other people can reasonably claim to
be the owners of those latter URIs (e.g., they determine that those URIs
denote me), even though they don't own the resource the URIs identify
(me). Moreover, these other people (more effectively) can create URIs
for the resources with retrievable representations (reports referring
to, among other things, Frank Manola), and those resources (the reports)
are distinct from the resource Frank Manola. In this case, it seems to
me that those other people are the owners of both the resources (the
reports) and the URIs that identify them.
====
[[Principle: URI assignment: A resource owner SHOULD assign a URI to
each resource that others will expect to refer to.]]
This seems as if it should read "A resource owner SHOULD assign a URI to
each resource that the owner expects others will want to refer to."
(How can others expect to refer to resources they don't necessarily know
about?)
====
Section 2.1: [[URI producers should be conservative about the number of
different URIs they produce for the same resource. For example, the
parties responsible for weather.example.com should not use both
"http://weather.example.com/Oaxaca" and
"http://weather.example.com/oaxaca" to refer to the same resource;
agents will not detect the equivalence relationship by following
specifications. On the other hand, there may be good reasons for
creating similar-looking URIs. For instance, one might reasonably create
URIs that begin with "http://www.example.com/tempo" and
"http://www.example.com/tiempo" to provide access to resources by users
who speak Italian and Spanish.]]
Why does the first sentence refer to "URI producers" that "produce" URIs
rather than "resource owners" that "create" them (which would be more
consistent with earlier text). I also note that words "assign",
"create", and "produce" (and possibly others) are all used for what
seems to be the same idea.
Also, the rest of this illustration seems to have a funny interaction
with the URI opacity principle in Section 2.5 (especially the discussion
there about the travel example), since the Section 2.1 text above seems
to suggest there is value in being able to convey information to an
accessing "agent" (a human in this case) via the form of the URI itself
(i.e., if URIs are to be totally opaque to the "agent", why would there
be value in using one language over another?). Of course, this may be
just another problem in allowing "agent" to refer to people. However,
the problem seems somewhat more acute if the result of dereferencing
URIs in different languages is the retrieval of the report in the
corresponding languages because, while this kind of makes sense, it also
invites determining the language of the report from the language of the URI.
====
Section 2.2: [[The requirement for URIs to be unambiguous demands that
different agents do not assign the same URI...]]
Now we have *agents* assigning URIs rather than, e.g., resource or URI
owners. It's not clear that this is consistent with prior discussion.
====
[[The concept of URI ownership is especially visible in the case of the
HTTP protocol, which enables the URI owner to serve authoritative
representations of a resource.]]
This text is pertinent to the point raised earlier about resource vs.
URI ownership, and might be expanded on a bit to clarify that
relationship. In particular, when dealing with URIs that have
retrievable representations, it is straightforward to demonstrate
ownership; non-owners can't determine what is returned when
dereferencing such URIs, while owners can.
====
Section 2.3: [[URI ambiguity should not be confused with ambiguity in
natural language. The English statement "'http://www.example.com/moby'
identifies 'Moby Dick'" is ambiguous because one could understand the
phrase "Moby Dick" to refer to distinct resources: a particular printing
of this work, or the work itself in an abstract sense, or the fictional
white whale, or a particular copy of the book on the shelves of a
library (via the Web interface of the library's online catalog), or the
record in the library's electronic catalog which contains the metadata
about the work, or the Gutenberg project's online version]]
This example illustrates an ambiguous natural language statement, but
it's not clear that it doesn't also illustrate an ambiguous URI, since
the text doesn't say anything about how example.org, or other parties
citing http://www.example.com/moby, actually intepret it.
====
Section 2.3.1: [[In Web architecture, URIs identify resources. Outside
the bounds of Web architecture specifications, URIs can be useful for
other purposes, for example, as database keys...]]
It seems to me this paragraph mixes a few things. Just because a URI is
used as a database key doesn't necessarily mean it's being used for a
different purpose. If a URI is used as a key in a relational table that
associates metadata with the Web resources identified by those keys, and
does so correctly (i.e., distinguishes between metadata about Nadia and
metadata about her mailbox), it seems as if this is the *same* use of
the URI (to identify a Web resource), even though it may also be used in
the database to identify a distinct row in the table. Moreover, the
database might exhibit URI ambiguity in the same way the Web might,
e.g., by mixing metadata about both Nadia and her mailbox in the same
row. At the same time, the use of "mailto:nadia@example.com" as an
identifier for Nadia rather than her mailbox seems just as likely to
occur in a Web context as in this database one (people seem to want to
do it in RDF, for example; or is this not the part of the Web you're
talking about?).
Also, in the same paragraph: [[URI ambiguity arises a URI is used to
identify two different Web resources.]]
URI ambiguity arises *when* a URI...
====
Section 2.4: [[Because of these costs, if a URI scheme exists that
meets the needs of an application, designers should use it rather than
invent one.]]
This sentence refers to "designers", but the "Good Practice" point below
refers to "authors of specifications". Shouldn't the same terms be used
in both places?
====
Section 2.5: [[It is tempting to guess the nature of a resource by
inspection of a URI that identifies it. However, the Web is designed so
that agents communicate resource state through representations, not
identifiers.]]
This is another place where including people in the definition of
"agents" seems to create a possible difficulty. If agents include
people, then people quite frequently communicate information about the
nature of a resource by inspection of URIs, and it's very helpful. For
example, "http://weather.example.com/oaxaca" certainly suggests that the
resource it identifies has something to do with the weather in oaxaca
(as is noted further on), and that's very useful information (e.g., when
people pass those URIs around). That's certainly information about "the
nature of a resource", and Internet Media Types aren't the only things
relevant to people. This all, of course, reads much better if "agents"
are restricted to software. Pursuing this point in the subsequent text:
[[Agents making use of URIs MUST NOT attempt to infer properties of the
referenced resource except as licensed by relevant specifications.]]
This is good practice for software "agents". For people "agents", given
the "must not", how do you propose to stop them?
Further to this point, the text goes on:
[[The example URI used in the travel scenario
("http://weather.example.com/oaxaca") suggests that the identified
resource has something to do with the weather in Oaxaca. A site
reporting the weather in Oaxaca could just as easily be identified by
the URI "http://vjc.example.com/315". And the URI
"http://weather.example.com/vancouver" might identify the resource "my
photo album."]]
This is certainly true. But while it's good practice for software to
treat URIs opaquely, it seems to me that given the discussion in Section
2.1, which seems to license creating "descriptive" URIs in different
languages to enable people speaking those languages to more easily
access a resource (and which reflects the use of text in URIs as a means
for conveying information to people), you might want to suggest that,
given this "dual purpose" of URIs, it's *not* good practice to use the
URI "http://weather.example.com/vancouver" to identify the resource "my
photo album", even though one could, and it would be irrelevant to software.
====
Section 2.7.2: Reference [RDF10] cites the RDF M&S Recommendation,
rather than the new Recommendation set, and should be updated (the OWL
reference should be updated as well). If *one* of the new RDF documents
is to be cited, I would suggest Concepts.
====
Section 3: First sentence:
[[Communication between agents over a network about resources...]]
Too many qualifying phrases here (e.g., "about resources" presumably
qualifies "Communication", rather than "a network", but it's kind of
distant...)
====
Section 3.3.1: [[Note that one can use a URI with a fragment identifier
even if one does not have a representation available for interpreting
the fragment identifier (one can compare two such URIs, for example).
Parties that draw conclusions about the interpretation of a fragment
identifier without retrieving a representation do so at their own risk;
such interpretations are not authoritative.]]
This is a place where some qualifying context about the nature of the
Web to which this architecture applies would have been helpful. For
example, suppose I have a collection of RDF or OWL statements having as
subjects the URI "http://www.example.com/images/nadia#hat", and the
RDF/OWL statements assert that the subject is of class "Hat" in some
ontology, that it's blue, and so on. On one hand, it seems as if one
could reasonably draw conclusions about the interpretation of this
fragment identifier (or rather the whole URI including it) *from the
RDF/OWL* without dereferencing the URI (using the URI to retrieve a
representation, whose media type specifies the authoritative
interpretation), assuming that the RDF/OWL itself is from a sufficiently
"authoritative" (in some sense) representation somewhere. Saying "such
intepretations are not authoritative" without any further qualification
or discussion, while it makes perfect sense given the way the Web works
now, doesn't seem to take such additional usage (which, after all, is
described in W3C Recommendations) into account.
====
Section 3.3.2: It would be helpful if the text following the "story"
explicitly answered the question posed in the "story". For example, if
the idea here is that the fragment should always identify Nadia's hat in
any graphic representation provided by dereferencing the URI containing
the fragment, it would help to say that.
====
Section 3.4: First para: [[To give these parties the confidence that
they are all talking about the same thing when they refer to "the
resource identified by the following URI ..." the design choice for the
Web is, in general, that the owner of a resource assigns the
authoritative interpretation of representations of the resource.]]
The text "owner of a resource" links to Section 2.2 titled "URI
Ownership". So why say "owner of a resource" rather than "owner of a
URI"? Also, Section 3.3 just got through telling us that if the URI
contains a fragment identifier, then the Internet Media Type of the
retrieved representation specifies the authoritative interpretation of
the fragment identifier. I realize that in one case it's the
authoritative interpretation *of the fragment* and in the other its the
authoritative interpretation *of representations of the resource*, but
the use of "authoritative interpretation" in both places (particularly
when they're so close together) seems potentially confusing.
====
Section 3.4.1: [[User agents should detect such inconsistencies but
should not resolve them without involving the user.]]
Now the term is "user agent" rather than "agents". Is there some
particular reason for distinguishing between these terms?
====
Section 3.5: [[Nadia's retrieval of weather information (an example of
a read-only query or lookup) qualifies as a "safe" interaction; a safe
interaction is one where the agent does not incur any obligation beyond
the interaction. An agent may incur an obligation through other means
(such as by signing a contract). If an agent does not have an obligation
before a safe interaction, it does not have that obligation afterwards]]
Here, "agent" is used in a sense where it might well be a person
("signing a contract"). Can software agents "incur obligations" in the
sense used here?
====
[[Other Web interactions resemble orders more than queries.]]
Is this "orders" in the sense of "placing an order", "that's an order,
soldier", or both?
====
[[Principle: Safe retrieval
Agents do not incur obligations by retrieving a representation.]]
Shouldn't this be "should not" rather than "do not"? "Do not" suggests
that it doesn't happen, rather than that it's incorrect if it does (as
described in the next sentence).
====
Section 3.6: Following the story appears:
[[The usefulness of a resource depends on good management by its owner.
As is the case with many human interactions, confident interactions with
a resource depend on stability and predictability. The value of a URI
increases with the predictability of interactions using that URI.
Avoiding unnecessary URI aliases is one aspect of proper resource
management.]]
While the last sentence above certainly seems true, it's not clear what
it has to do with the story, since the sentence refers to URI aliases,
but the story describes multiple uses of the *same* URI returning wildly
different results. Is this supposed to refer to "URI ambiguity"
instead? It's also not clear that the problem illustrated in the story
is necessarily "inconsistent representation". Given the diagram in
Section 1, which distinguishes the URI from the resource it identifies,
it seems possible to distinguish (at least conceptually) between:
(a) returning pages about weather information and auto insurance as
inconsistent representations of the same resource, and
(b) returning the same two pages as indicating that the URI owner has
changed the resource the URI identifies
====
Section 3.6.3: It might help clarify the point made in this section if
some examples of mistaken attempts to restrict the use of URIs were
given, rather than just the building security analogy. Also, it's not
clear whether or not the principle described here (and the further
discussion in the "Deep Linking" finding) deals with all possible
situations of this sort. For example, it certainly used to be the case
(and may be the case now) that US Defense Department documents could not
only have a security classification, but their *titles* might also have
a security classification (that is, the *existence* of the document was
classified). A classified document with an unclassified title could be
referenced in the usual way, but a reader without the necessary
clearance would be unable to access the referenced document (this would
correspond to the situations already described). On the other hand,
classifying the title of the document would prevent the reader from even
seeing the reference without the necessary clearance. How would you
suggest handling this situation (admittedly, opagueness of URIs would help!)
====
Section 4: First para [[The first data format used on the Web was HTML.
Since then, data formats have grown in number. ]]
The second sentence should probably say "...*Web* data formats..."
(since data formats in general were growing well before HTML).
====
Second para: [[ Below we describe some characteristics of a data format
make it easier to integrate into the Web architecture. ]]
"...*that* make it easier..."
====
Section 4.2.1: [[There is typically a (long) transition period during
which multiple versions of a format, protocol, or agent are
simultaneously in use.]]
This is another passage that reads strangely if "agent" is considered to
include people (I know I'm not the same person I once was, but multiple
versions simultaneously in use?!).
====
[[Good practice: Version information
Format designers SHOULD provide for version information in language
instances]]
What are "language instances"?
====
Section 4.2.2: [[The policy sets expectations that the Working Group
responsible for the namespace may modify it in any way until a certain
point in the process ("Candidate Recommendation") at which point W3C
constrains the set possible changes to the namespace in order to promote
stable implementations.]]
"...constrains the set *of* possible changes..."
====
Section 4.2.3: [[As part of defining an extensibility mechanism, a
specification should set expectations about agent behavior in the face
of unrecognized extensions.]]
The following good practice then says
[[Language designers SHOULD specify agent behavior in the face of
unrecognized extensions.]]
It's not clear that a specification "setting expectations about" agent
behavior is the same as it "specifying" it. Why the difference in wording?
====
Section 4.2.4: Third bullet [[ * RDF allows well-defined mixing of
vocabularies, and allows text and XML to be used as a data type values
within a statement having clearly defined semantics.
]]
"...allows text and XML to be used as data type values..." (delete the "a")?
Within the same statement? What does "having clearly defined semantics"
modify? Should this be "...within statements having clearly defined
semantics"?
====
Section 4.5.2: [[XLink allows links to have multiple ends and to be
expressed either inline or in "link bases" stored external to any or all
of the resources identified by the links it contains.]]
This sentence seems a tad complicated.
====
Section 4.5.3: [[How do the application designers ensure that there are
no naming conflicts when they combine elements from different formats
(for example, suppose that the "p" element is defined in two or more XML
formats)? "Namespaces in XML" [XMLNS] provides a mechanism for
establishing a globally unique name that can be understood in any context.]]
I'd suggest rewriting this to avoid the rhetorical question.
Also, this may or may not be the right place to cite this example, but
the way the XML Schema data types define a namespace for those types,
allowing them to be used in RDF and OWL, might also be cited. (This
example illustrates the need sometimes to explicitly describe how URIs
identifying language elements should be constructed using those
namespace names).
====
Section 4.5.5: The link for "rdfmsQnameUriMapping-6" needs a fragment
to identify the specific issue.
====
Received on Wednesday, 10 March 2004 17:18:12 UTC