- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 13 Feb 2004 16:14:55 -0500
- To: public-webarch-comments@w3.org
This document has come a long way since I last read it. Excellent.
Here are some comments on the first half or so. Some are trivial,
some are more substantive.
Commenting on
http://www.w3.org/TR/2003/WD-webarch-20031209/
-- sandro
===========================================
== Comment 1, 1. Introduction:
Identification. Each resource is identified by a URI. In this
travel scenario, the resource is about the weather in Oaxaca and
the URI... ^^^^^
This was jarring to read. The text up to that point is simple and
direct, but suddenly here there's handwaving with "about." What *is*
the resource identified by that URI? Fortunately, in the picture you
answer this question.
Suggested text:
Each resource is identified by a URI. In this travel scenario, the
resource is a periodically-updated report on the weather in Oaxaca,
and the URI ...
===========================================
== Comment 2, 1. Introduction:
The server responds with a representation that includes XHTML
data and the Internet Media Type "application/xml+xhtml".
In the graphic, you show the media type as text/html, which is
probably the better choice for simplicity's sake.
===========================================
== Comment 3, 1.1.3 Principles, Constraints, and Good Practice
This categorization is derived from Roy Fielding's work on
"Representational State Transfer" [REST]. Authors of protocol
specifications in particular should invest time in understanding
the REST model and consider the role to which of its principles
could guide their design: statelessness, clear assignment of
roles to parties, uniform address space, and a limited, uniform
set of verbs.
The first sentence is fine, the second reads rather like a paid
product placement. Is Fielding's thesis that much better than every
other work ever written on distributed systems design that it merits
strong recommendation in the section introducing labeling terms? If
you want to save this text, put it in a Recommended Reading section.
===========================================
== Comment 4, 1.2.1. Orthogonal Specifications:
... agents can interact with any identifier ...
That's ambiguous. Replace "with" with "using" and I think you're
okay. Otherwise it sounds rather like the identifier is one of the
parties doing something in an interaction.
===========================================
== Comment 5, 1.2.1. Orthogonal Specifications:
... that the an image ...
^^^
(typo)
===========================================
== Comment 6, 1.2.2. Extensibility:
The following applies to languages, in particular the
specifications of data formats, of message formats, and
URIs. Note: This document does not distinguish in any formal way
the terms "format" and "language." Context has determined which
term is used.
I can't really parse the first sentence. Maybe you mean something like:
The data formats and (more generally) formal languages used
in the bodies of messages and even in the text of URIs can be
defined to have certain properties to promote evolution and
interoperation.
===========================================
== Comment 7, 1.2.4. Protocol-based Interoperability
It is common for programmers working with the Web to write code
that generates and parses these messages directly. It is less
common, but not unusual, for end users to have direct exposure to
these messages. This leads to the well-known "view source"
effect, whereby users gain expertise in the workings of the
systems by direct exposure to the underlying protocols.
This seems out of place. I get the point, but it's never summed up.
And I don't see how it belongs in 1.2 General Architecture Principles.
I think you mean:
Good practice: design protocols and data formats which
people can view and reproduce with a minimum of special tools and
effort.
[ Ahhh, this is finally covered in Section 4.1; maybe a forward link? ]
and maybe:
Good practice: user agents should allow user to look "inside" to
see (and even manipulate) the protocol interactions the agent is
performing on behalf of the user.
===========================================
== Comment 8, 2. Identification (see also Comment 10 below)
Parties who wish to communicate must agree upon a shared set of
identifiers and on their meanings.
This is untrue for some reasonable meanings of "meaning", as Pat Hayes
has argued from time to time. You could say instead:
Parties who wish to communicate must agree on the practical
effects of using certain identifiers.
or
Parties who wish to communicate must agree upon a shared set of
identifiers and (to a reasonable degree) on their meanings.
That is: some ambiguity of meaning is both reasonable and
unavoidable. I don't think an unqualified "agree" normally means
"partially agree".
Does http://weather.example.com/oaxaca identify the weather report for
just Oaxaca or for the Oaxaca region? When it starts to matter, you
can start to build a shared understanding of which it is. But you
can't banish those ambiguities until you notice them. There's also a
school of design where you choose not to banish them, even when you
see them, until you know they matter.
===========================================
== Comment 9, 2.2. URI Ownership
... the "uuid" scheme ...
and
... the "md5" scheme ...
but you don't give references. They are not on IANA's list. I pay
some attention, and I'm not aware of a stable specification for either
one. The spec on DanC's list for UUID has long since expired; the
reference for MD5 is simply to a hypothetical use of it.
For uuid you could use urn:nid-5, but that's technically not a
"URI scheme":
http://www.iana.org/assignments/urn-namespaces
http://lists.research.netsol.com/pipermail/urn-nid/2002-July/000308.html
Maybe you can says "such as a possible 'UUID' scheme", etc, or you
could use WebDAV's unique-lock-token scheme.
===========================================
== Comment 10, 2.3. URI Ambiguity (see also Comment 8 above)
URI ambiguity should not be confused with ambiguity in natural
language.
Well then use a different word! Please! Call it "URI Overloading"
and let "URI Ambiguity" be used for the unavoidable and quite
acceptable situation I talked about in my Comment 8.
"Overloading" seems to me the appropriate word. Eg:
(1) In programming languages, a feature that allows an object to
have different meanings depending on its context. The term
is used most often in reference to operators that can behave
differently depending on the data type, or class, of the
operands. For example, x+y can mean different things
depending on whether x and y are simple integers or complex
data structures.
-- http://www.webopedia.com/TERM/o/overloading.html
I suggest, then, that "URI overloading" is the bad practice of using
one URI to identify multiple things commonly known to be distinct and
useful to distinguish among, while "URI ambiguity" refers to the
fact that we can never communicate *exactly* what someone else means a
URI to identify.
===========================================
== Comment 11, 2.3.1. URIs in other Roles:
The fact that the URI serves other purposes in non-Web contexts
does not lead to URI ambiguity. URI ambiguity arises [when] a URI
is used to identify two different _Web_ resources.
What makes the example case okay is not that Nadia is not a "_Web_
resource". She may well be.
There are two things that make it okay to refer to Nadia as
"mailto:nadia@example.com".
1. It's done in secret, in private. People can redefine terms
however they want in private. It's probably a bad idea for
the same reason defining private URI schemes is bad, of
course; they leak out. But until they do, it's basically
okay.
2. Their system is maintaining an implicit indirection between
the mailbox and the person. They're just abbreviating the
screen prompt which should strictly say "Please enter the
identifier for the mailbox of the person:" to "Please enter
person id:". That's just simplification for the human
reader -- the programmer should be remembering the
indirection in effect and when necessary (eg writing out the
conference registration data in RDF to publish) using or
documenting it explicitely. Not a problem.
===========================================
== Comment 12, 2.6. Fragment Identifiers:
The fragment identifier of a URI allows indirect identification
of a secondary resource by reference to a primary resource and
additional information. The secondary resource may be some
portion or subset ...
So if a URI contains a "#", then the "second resource identified by
the URI" is the same as the "resource identified by the URI." That's
a rather odd use of the word "secondary".
I would suggest instead that you:
(1) Name the the portion of the URI up the the "#". TimBL has
called this the "racine", but I like "stem", "trunk", or
maybe even "non-fragment portion".
(2) Call the resource identified by a URI's stem the "stem
resource", or something like that.
I imagine the awkward primary/secondary bit is probably left over
from the idea that "URI-References" were not URIs.
Actually, that's worth talking about. The newly issue RDF
Recommendations still call then "URI References":
A URI reference within an RDF graph (an RDF URI reference) is a
Unicode string ...
-- http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref
Can you add some text saying why you've decided "URI References" are
now to be called "URIs", ... or something?
===========================================
== Comment 13
References for OWL and RDF can now point to the Recs.
===========================================
== Comment 14, 2.7.2. Assertion that Two URIs Identify the Same Resource
Emerging Semantic Web technologies, including the "Web Ontology
Language (OWL)" [OWL10], define RDF [RDF10] properties such as
sameAs to assert that two URIs identify the same resource or
functionalProperty to imply it.
I would add:
Knowing two URIs identify the same resource does not, however,
mean they are interchangeable. For example, Oaxaca might have
several government-run weather stations, and the measurements take
from each of these might be available from both
weather.example.org and weather.example.com.
The first might call a particular station
http://weather.example.org/stations/oaxaca#ws17a
while the second calls it
http://weather.example.com/rdfdump?region=oaxaca&station=ws17a
These two URIs would both identify the same resource, a certain
collection of weather measuring equipment. They are owl:sameAs
each other. But an attempt to dereference them might well produce
different content produced by different organizations (probably
based originally on the same government-supplied data), so a user
agent which substituted one for the other would be serving its
user poorly.
===========================================
== Comment 15, 3.2. Messages and Representations (#def-representation)
... includes a representation of the state of the resource. A
representation is an octet sequence ...
... electronic data about resource state ...
Does "state" really mean anything there? Is there a difference
between "data about Ian" and "data about the state of Ian"? Maybe
this could be clarified with:
Note: the phrases "representation of the state of the resource" and
"representation of the resource" mean essentially the same thing;
the term "state" is sometimes used to help convey that resources
and thus their representations often vary over time.
===========================================
== Comment 16, 3.2. Messages and Representations (#def-representation)
... electronic data about resource state ...
In theory, not all computers are electronic. How about just "data"?
Or "information".
===========================================
== Comment 17, 3.5. Safe Interactions
and sends a message composed of form data using the POST
method. Note that this is not a safe interaction;
So you're suggesting it's not safe to engage in e-commerce? :-)
I know you've worked hard to find a nice, simple way to explain When
To Use GET, but I don't think this is it. Asking a non-commital
question can be very dangerous. For instance:
GET http://books.example.org/
check-order-status?credit-card=1234567877653453
That's non-committal, but it's certainly not "safe".
Similarly, using POST isn't necessarily "unsafe"; it just means you're
making a statement instead of asking a question.
It's not quite as snappy as "GET IS SAFE!", but I suppose Ian could
get people to chant "Never Punish People For Asking Questions!".
===========================================
== Comment 18, on 3.6. Representation Management
Dirk clicks on the link in the email he receives and is surprised
to see his browser display a page about auto insurance.
That seems pretty implausible. It's a working weather site to one
user and a working car-insurance-ad site to another? Based in IP
address, or what? Did his IP address get listed in a database of
entries like, "please deliver this ad to that guy, if he ever
visits"...?
This seems more common and likely to me:
Since Nadia finds the Oaxaca weather site useful, she emails a
review to her friend Dirk recommending that he check out
'http://weather.example.com/oaxaca'. Dirk clicks on the link in
the email he receives and is surprised to see his browser respond
"404 Not Found". Dirk confirms the URI with Nadia, who now has
the same problem. Eventually, they figure out the site has been
reorganized, and the page Nadia recommended is now called
"http://weather.example.com/newserv?loc=oaxaca". Embarassed by
this interaction, Nadia stops recommending the site people.
===========================================
== Comment 19, on URI Persistence
There are strong social expectations that once a URI identifies
a particular resource, it should continue indefinitely to refer
to that resource; this is called URI persistence.
I think "URI persistence" is more about observable behavior than the
URI->Resource binding. What matters most is what the page is useful
for -- that's what guides how people try to categorize and store and
annotate and generally talk about it. A daily news page which stops
being updated is nearly as bad as one which goes 404. It's the same
resource ("Joe's Life"), it's just the representations aren't as high
quality any more. Same resource being identified; different
experience. Uncool URI.
===========================================
== Comment 20, 3.7. Future Directions for Interaction
"MLdonkey" appears to be just an application/client. The actually
network/protocol which MLdonkey and many other app/clients implement is
called "eDonkey2000". (http://www.edonkey2000.com/)
===========================================
== Comment 21, 4.2.3. Extensibility
2. "Must understand": The agent treats unrecognized markup
as an error condition.
"markup"? This isnt just about XML. How about "syntactic constructs".
Received on Friday, 13 February 2004 16:12:34 UTC