Review of Web Architecture doc - WD-webarch-20031209 from Graham Klyne on 2004-03-05 (public-webarch-comments@w3.org from March 2004)

From: Graham Klyne <gk@ninebynine.org>
Date: Fri, 05 Mar 2004 14:27:57 +0000
To: public-webarch-comments@w3.org
Message-Id: <5.1.0.14.2.20040305142621.01fbe8f8@127.0.0.1>
Reviewing:
   http://www.w3.org/TR/2003/WD-webarch-20031209/
Modified: 08 December 2003 22:07:31

Generally, I think the document is looking in very good shape.  Most of my 
comments are just editorial in nature.

There are a very few comments that I regard as possibly more substantive, 
concerning:
   Section 3.4:
   Section 3.6.2:
   Section 4.2.4:

I have included some suggested revisions not because I think they're 
necessarily better than the text already used, but to illustrate the points 
I am trying to raise.

...

Section 1.2.4:
[[
It is common for programmers working with the Web to write code that 
generates and parses these messages directly. It is less common, but not 
unusual, for end users to have direct exposure to these messages. This 
leads to the well-known "view source" effect, whereby users gain expertise 
in the workings of the systems by direct exposure to the underlying protocols.
]]

It was not clear to me what is the intended significance of this with 
respect to Web Architecture.  Suggest:  explain the significance or drop 
this paragraph.

[minor editorial]

...

Section 2:
[[
A URI must be assigned to a resource in order for agents to be able to 
refer to the resource. It follows that a resource should be assigned a URI 
if a third party might reasonably want to link to it, make or refute 
assertions about it, retrieve or cache a representation of it, include all 
or part of it by reference into another representation, annotate it, or 
perform other operations on it.
]]

"or perform other operations on it" suggests a resource should be a very 
concrete thing.  Suggest "or refer to it in some other way".

[minor editorial]

...

Section 2:
[[
When a representation uses a URI (instead of a local identifier) as an 
identifier, then it gains great power from the vastness of the choice of 
resources to which it can refer. The phrase the "network effect" describes 
the fact that the usefulness of the technology is dependent on the size of 
the deployed Web.
]]

The comment about "network effect" in the first para seems somewhat 
disjoint.  What does it tell us about Web architecture?  Suggest: "This 
vastness of choice gives rise to a "network effect", which refers to a 
technology's usefulness increasing more rapidly than the size of the 
network across which it is deployed"

[minor editorial]

...

Section 2:
[[
A URI must be assigned to a resource in order for agents to be able to 
refer to the resource. It follows that a resource should be assigned a URI 
if a third party might reasonably want to link to it, make or refute 
assertions about it, retrieve or cache a representation of it, include all 
or part of it by reference into another representation, annotate it, or 
perform other operations on it.

[...]

Resources exist before URIs; a resource may be identified by zero URIs. 
However, there are many benefits to assigning a URI to a resource, 
including linking, bookmarking, caching, and indexing by search engines. 
Designers should expect that it will prove useful to be able to share a URI 
across applications, even if that utility is not initially evident.
]]

There seems to be some overlap between these paragraphs.  And I found the 
first sentence of the second paragraph to be potentially confusing.

Suggest: a re-arrangement:
[[
A URI must be assigned to a resource in order for agents to be able to 
refer to the resource. It follows that a resource should be assigned a URI 
if a third party might reasonably want to link to it, make or refute 
assertions about it, retrieve or cache a representation of it, include all 
or part of it by reference into another representation, annotate it, or 
refer to it in some other way.  A resource may exist independently of 
whether or not it has a URI; one or more URIs may be used to identify a 
given resource.

[...as before...]

There are many benefits to assigning a URI to a resource, as noted 
above.  Designers should expect that it will prove useful to be able to 
share a URI across applications, even if that utility is not initially evident.
]]

[editorial]

...

Section 2:
[[
The scope of a URI is global; the resource identified by a URI does not 
depend on the context in which the URI appears (see also the section about 
URIs in other roles). Of course, what an agent does with a URI may vary. 
The TAG finding "URIs, Addressability, and the use of HTTP GET and POST" 
discusses additional benefits and considerations of URI addressability.
]]

The term "global" here is not defined or qualified.  Suggest "global across 
the Web".

[editorial]

...

Section 2.1:
[[
... For example, the parties responsible for weather.example.com should not 
use both "http://weather.example.com/Oaxaca" and 
"http://weather.example.com/oaxaca" to refer to the same resource; agents 
will not detect the equivalence relationship by following specifications. ...
]]
and
[[
... Agents should not assume, for example, that 
"http://weather.example.com/Oaxaca" and "http://weather.example.com/oaxaca" 
identify the same resource, since none of the specifications involved 
states that the path part of an "http" URI is case-insensitive.
]]

While correct, I felt this was potentially a little confusing.  The first 
example did not seem well chosen to reflect the point I think is being 
made.  Suggest:

[[
... For example, the parties responsible for weather.example.com should not 
use both "http://weather.example.com/Oaxaca" and 
"http://weather.example.com/Mexico?city=Oaxaca" to refer to the same 
resource; agents will not detect the equivalence relationship by following 
specifications. ...
]]

Hmmm... maybe there's a third point to be made here, namely that the party 
responsible for some domain should avoid using different URIs with small, 
easily overlooked differences?

[editorial]

...

Section 2.2:
[[
Hierarchical delegation of authority. This approach, exemplified by the 
"http" and "mailto" schemes, allows the assignment of a part of URI space 
to one party, reassignment of a piece of that space to another, and so forth.
]]
While technically correct, I don't think 'mailto' is a useful example of 
hierarchical delegation of naming authority within a URI structure.  I'd 
suggest 'ftp:' or 'urn:' or 'file:' or 'ldap:'

[minor editorial]

...

Section 2.3:
[[
URI ambiguity should not be confused with ambiguity in natural language.
]]
I'm not sure what this sentence is trying to say (what is meant here by 
"confused with").  From what follows, I think the intent is to say 
something like "justified by", in which case I think something like:
[[
URIs should not be permitted the ambiguity that occurs in natural language.
[...existing text...]
This flexibility is not available to URIs, which should be defined to refer 
to a single concept.
]]

[later]
I ran across this from TimBL in one of the Tag IRC logs, which seems to 
capture the point more effectively.
[[
Suggested text for 2.6: Whereas human communication tolerates such 
ambiguity, machine processing does not. Strictly, the above URI as 
identifies the information resource, some hypertext document. RDF 
applications which use it for describing properties of that page are in 
order; those who use its URL to directly assert properties of the whale are 
using it inconsistently.
]]
-- http://www.w3.org/2003/07/22-tagmem-irc.html 22:06:17

[editorial]

...

Section 2.4.1:
[[
The use of unregistered URI schemes is discouraged for a number of reasons:
]]

This doesn't seem to be strong enough.  Suggest:
[[
The use of unregistered URI schemes is not a permitted part of the Web 
architecture, for a number of reasons:
]]

[substantial]

...

Section 2.5:
[[
Resource state may evolve over time. Requiring resource owners to change 
URIs to reflect resource state would lead to a significant number of broken 
links. For robustness, Web architecture promotes independence between an 
identifier and the identified resource.
]]

I think a link to orthogonality (section 1.2.1) may be appropriate about here.

[minor editorial]

...

Section 3.1:
[[
Although many URI schemes are named after protocols, this does not imply 
that use of such a URI will result in access to the resource via the named 
protocol. Even when an agent uses a URI to retrieve a representation, that 
access might be through gateways, proxies, caches, and name resolution 
services that are independent of the protocol associated with the scheme name.
]]

As phrased, I find this to be at odds with the text that follows, cf. 
numbered items 4/5/6.  Suggest replace "... use of such a URI will result 
..." with "... use of such a URI will necessarily result ..."

[editorial]

...

Section 3.3.2:
[[
For a given resource, an agent may have the choice between representation 
data in more than one data format (through HTTP content negotiation, for 
example). Since different data formats may define different fragment 
identifier semantics, it is important to note that by design, the secondary 
resource identified by a URI with a fragment identifier is expected to be 
the same across all representations. Thus, if a fragment has defined 
semantics in any one representation, the fragment is identified for all of 
them, even though a particular data format may not be able to represent it.
]]

The term "by design" seems rather odd here.  It seems to me that the 
(technical) design specifically does not achieve "the secondary resource 
identified by a URI with a fragment identifier is ... the same across all 
representations".

I think the clause "by design" could be dropped without loss (or, maybe, 
replaced with something like "by intent").

[minor editorial]

...

Section 3.2:
[[
On the other hand, it is considered an error if the semantics of the 
fragment identifiers used in two representations of a secondary resource 
are inconsistent.
]]

This seems a rather odd statement to make (specifically: "it is considered 
an error ...", because there is no specific way to determine if the 
would-be erroneous condition actually arises.  Suggest:  drop this 
paragraph;  the intent is clear enough from the following good practice point.

[editorial]

...

Section 3.4:
[[
Successful communication between two parties using a piece of information 
relies on shared understanding of the meaning of the information. Arbitrary 
numbers of independent parties can identify and communicate about a Web 
resource. To give these parties the confidence that they are all talking 
about the same thing when they refer to "the resource identified by the 
following URI ..." the design choice for the Web is, in general, that the 
owner of a resource assigns the authoritative interpretation of 
representations of the resource.
]]
I recall that TimBL and Pat Hayes had a lengthy debate about something 
rather like this
Thread starting:
   http://lists.w3.org/Archives/Public/www-tag/2003Jul/0022.html
with some indication of consensus around:
   http://lists.w3.org/Archives/Public/www-tag/2003Jul/0316.html
   http://lists.w3.org/Archives/Public/www-tag/2003Jul/0344.html

I am not sure that the above text really captures the subtlety of this 
discussion.  As Pat Hayes noted:
[[
 >Note though that other non-RDF systems may and do use URIs.  So the
 >principle can must be a general one of web architecture.

Names are global in scope.  OK, though (in the other branch of the
discussion) I don't think this is going to be feasible, myself, if
taken strictly. Still, I agree, its not a bad place to start, as long
as we understand that we will eventually have to replace it with
something more sophisticated.
]]
-- http://lists.w3.org/Archives/Public/www-tag/2003Jul/0344.html

[significant/editorial]

...

Section 3.6:
[[
Since Nadia finds the Oaxaca weather site useful, she emails a review to 
her friend Dirk recommending that he check out 
'http://weather.example.com/oaxaca'. Dirk clicks on the link in the email 
he receives and is surprised to see his browser display a page about auto 
insurance. Dirk confirms the URI with Nadia, and they both conclude that 
the resource is unreliable. Although the managers of Oaxaca have chosen the 
Web as a communication medium, they have lost two customers due to 
ineffective resource management.
]]

I think that "the managers of Oaxaca" should be "the managers of 
http://weather.example.com/".

[editorial]

...

Section 3.6.2:
[[
There are strong social expectations that once a URI identifies a 
particular resource, it should continue indefinitely to refer to that 
resource; this is called URI persistence. URI persistence is a matter of 
policy and commitment on the part of authorities servicing URIs. The choice 
of a particular URI scheme provides no guarantee that those URIs will be 
persistent or that they will not be persistent.
]]

The terminology "authorities servicing URIs" seems to be not consistent 
with that used elsewhere; e.g. "authority responsible for a resource" at 
the start of section 3.6.1., and "URI producers" in section 2.1.

As I draft this, I think there's maybe a deeper omission here:  a lack of 
separation between the owner or authority responsible for a resource, and 
the authority for a particular part of URI space that may be used to 
identify a resource.  (cf. also my previous comment above.)  If not 
clarified, I think this could be a source of continuing miscommunication.

[significant/editorial]

...

Section 3.6.2:
[[
Inconsistent representations served. Note the difference between a resource 
owner changing representations predictably in light of the nature of the 
resource (the changing weather of Oaxaca) and the owner changing 
representations arbitrarily.
]]
The term "predictably" here seems an odd choice given the nature of the 
illustrative example (thinks... butterflies flapping in Beijing, 
etc.).  Suggest:  rationally.

[minor editorial]

...

Section 3.6.2:
[[
Improper use of content negotiation, such as serving two images as 
equivalent through HTTP content negotiation, where one image represents a 
square and the other a circle.
]]

This doesn't seem like a particularly helpful example, because in some 
contexts a circle and square may be genuinely different representations of 
a common underlying concept (e.g. alternative GraphViz presentations of an 
RDF graph).  Suggest: "... such as serving two images as equivalent through 
HTTP content negotiation, where one image represents a weather map of 
Oaxaca and the other a street map of Chihuahua"

[minor editorial]

...

Section 3.6.2:

I made a note to myself at the end of this section:
"Maye add a comment about metadata consistency and problems that may occur 
of a resource is not persistent"
but now I not sure what it is I meant by this.

I think I may have been thinking about a case where RDF is used to describe 
some resource, but the resource whose representation is served at a given 
URI is allowed to change over time.  Then, any RDF that uses said URI to 
describe the resource at some point in time becomes completely incorrect if 
the  URI is assigned to a different resource.  Is it worth trying to make a 
point that the value of RDF descriptions depends to a considerable extent 
on the stability/persistence of the URIs used?

[Significant?]

...

Section 4.*, esp. 4.2.*:

I notice that in this section, the terminology used slips from "data 
format" or just "format" to "language", without any explanation that they 
mean pretty much the same thing in this context (or, if they don't, without 
any explanation of the difference).

...

Section 4.2.4:
[[
RDF allows well-defined mixing of vocabularies, and allows text and XML to 
be used as a data type values within a statement having clearly defined 
semantics.
]]

I couldn't figure precisely what this was trying to say.

[editorial]

...

Section 4.2.4:
[[
Note however, that for general XML there is no semantic model that defines 
the interactions within XML documents with elements and/or attributes from 
a variety of namespaces. Each application must define how namespaces 
interact and what effect the namespace of an element has on the element's 
ancestors, siblings, and descendants.
]]

I think that there may be an important point to be made here about the 
relationship of the "Semantic Web" with what I might call the "Hypertext 
Web" upon which it is built, that the "Semantic Web" provides a 
well-defined way to combine statements that draw upon an arbitrary number 
of different namespaces.  (I regard this as one of the more important 
contributions of the Semantic Web.)

Maybe this is what the subject of my previous comment was trying to say?

[significant]

...

Section 4.3:
[[
Note that when content, presentation, and interaction are separated by 
design, agents need to recombine them. There is a recombination spectrum, 
with "client does all" at one end and "server does all" at the other. There 
are advantages to each: recombination on the server allows the server to 
send out generally smaller amounts of data that can be tailored to specific 
devices (such as mobile phones). However, such data will not be readily 
reusable by other clients and may not allow client-side agents to perform 
useful tasks unanticipated by the author. When a client does the work of 
recombination, content is likely to be more reusable by a broader audience 
and more robust. However, such data may be of greater size and may require 
more computation by the client.
]]

I think there are also some scalability concerns that might be mentioned 
here;  e.g. an application is, in general, more likely to operate at 
Internet scale if as much processing as possible is performed by user 
agents (often, clients) rather than centralized processing agents (often, 
servers).

...

Section 4.4:
[[
Language designers SHOULD incorporate hypertext links into a data format if 
hypertext is the expected user interface paradigm.
]]
I found this statement a bit puzzling:  many data formats have nothing to 
do with a user interface;  the preceding text says "What agents do with a 
hypertext link is not constrained by Web architecture and may depend on 
application context".  So what is this trying to say?

...

Section 4.1.1:

I found the text of this section less clear than was offered in an email 
from TimBL:
[[
It is important to distinguish between the string which identifies
something and the BNF for a string in a document which
is used to specify the first string.  The first is an identifier.
The second has been called a "reference".   A reference
can use a relative form.
]]
-- http://lists.w3.org/Archives/Public/www-tag/2002Sep/0043.html

[editorial]

...

Section 4.5:
[[
... While it is directed at Internet applications with specific reference 
to protocols, the discussion is generally applicable to Web scenarios as well.
]]

I am uneasy with this phrasing, as it seems to suggest the Web is somehow 
apart from the Internet.  Suggest:
[[
... While it is directed at Internet applications with specific reference 
to protocols, the discussion is also applicable to Web application formats.
]]

[minor editorial]

...

Section 4.5.1:

Another reference with discussion relating to this topic of choosing to use 
XML can be found here:
   http://www.ietf.org/rfc/rfc3117.txt , section 5.1

[for information]

...

Section 4.5.7:
[[
These Internet Media Types create two problems: First, for data identified 
as "text/*", Web intermediaries are allowed to "transcode", i.e., convert 
one character encoding to another. Transcoding may make the 
self-description false or may cause the document to be not well-formed.
]]

The statement "Web intermediaries are allowed to "transcode" ..." seemed to 
me to be rather broadly applied here.  Is there a specification that 
asserts this in general?  If not, I think the comment should be constrained 
to something like "in some Web applications, intermediaries are allowed to 
transcode ..."

[editorial]

...

Section 4.5.7:
[[
Second, representations whose Internet Media Types begin with "text/" are 
required, unless the charset parameter is specified, to be considered to be 
encoded in US-ASCII. Since the syntax of XML is designed to make documents 
self-describing, it is good practice to omit the charset parameter, and 
since XML is very often not encoded in US-ASCII, the use of "text/" 
Internet Media Types effectively precludes this good practice.
]]

I found this confusing, in that I wasn't clear what it was that was being 
said, and I couldn't see how it relates to the good practice point that 
immediately follows it.

[editorial]

...

That's all, folks!

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Friday, 5 March 2004 09:38:10 UTC