AWWW, 20040816 release, sections 1 and 2 from Graham Klyne on 2004-09-09 (public-webarch-comments@w3.org from September 2004)

From: Graham Klyne <gk@ninebynine.org>
Date: Thu, 09 Sep 2004 12:45:10 +0100
To: public-webarch-comments@w3.org
Message-Id: <5.1.0.14.2.20040909103113.02a31b98@127.0.0.1>
With reference to:
[1] http://www.w3.org/TR/2004/WD-webarch-20040816/

It is with some misgiving that I raise this, but based on my reading of 
sections 1 and 2 I can't avoid the feeling that section 2 of the document 
[1] is losing its focus.  Below, I'll highlight some specific areas that I 
feel are problematic, but more generally there seems to be a degree of 
attention to design details that is obscuring the underlying architectural 
ideas.

Early in my involvement with Web technology, I found selected notes in 
TimBL's DesignIssues [2] series to be extremely helpful in pinpointing key 
concepts around which the WWW was constructed;  it was, and remains, my 
expectation that the AWWW effort is intended to perform that function in a 
more formally authoritative fashion.  I feel the section 2 is less clear 
about the fundamentals than TimBL's original notes [3][4] etc.

[2] http://www.w3.org/DesignIssues/Overview.html
[3] http://www.w3.org/DesignIssues/Model.html
[4] http://www.w3.org/DesignIssues/Axioms.html

Actually, I think section 1 makes a pretty good approach to the presumed 
goal, but section 2 doesn't quite follow through.  Personally, I'd like to 
see section 2 radically pruned so that the important architectural issues 
come through more clearly.  I think a deal of the detailed expository 
material could usefully be relegated to the (referenced) TAG finding documents.

Taken individually, my comments here are all minor editorial issues, with 
barely any technical substance.  It is their cumulative effect that is the 
thrust of this message.

So much for the waffle...

...

General (nits)

I notice there's some inconsistency about the way that section 
cross-references are presented.  Sometimes they are hyperlinked section 
names, and sometimes they also include section numbers.  For people who are 
reading paper documents, I think that section numbers should be included 
(isn't there an accessibility principle here?).

I also notice some inconsistency regarding use of URI vs URIs for the 
plural of URI, even between adjacent paragraphs in a section (e.g. section 
2.3, paras 1 and 2).

...

Section 1.1.1:

[[
Readers will benefit from familiarity with the Requests for Comments (RFC) 
series from the IETF, some of which define pieces of the architecture 
discussed in this document.
]]

This isn't very helpful.  The RFC series consists of nearly 4000 documents, 
most of which have very little to do with the web.  Suggest:  drop this 
para or indicate some specific RFCs.

=====

SECTION 1

...

Section 1.1.3, introduction of "Principle"

I feel this term is being introduced to mean two related but quite 
different things, which I might describe as:

   principle-as-exhortation: something designers should strive to achieve
and
   principle-as-expectation: something that is observed or claimed to occur

e.g. "separation of concerns" is a case of the former, but "network effect" 
is  the latter.  Given the goal of the AWWW document (which I take to be to 
set out fundamental design choices that make the web what it is or is 
desired to be), I think that it would be appropriate for discussion 
"principles" to focus on the former.  (Principles as expectations are 
subsidiary here, in that they may be used to justify the principle design 
choices; e.g. "network effect" is part of the justification for "avoiding 
URI aliases")

Maybe it is intended that "Constraints" and "Practices" will cover what I 
call "principle-as-exhortation", and that "Principle" is intended to be 
"principle-as-expectation".  In which case I think the description is unclear.

...

Section 1.1.3, introduction of "Constraint"

This comes over a bit unfocused;  I think the important statement is rather 
buried in the middle of the paragraph: "Other design choices are more 
fundamental; these are the focus of this document.".  In this context, I 
take this to mean that constraints introduced by this document are such 
fundmeantals.

Suggest re-working (mainly re-ordering):
[[
Constraint

Constraints are fundamental design choices, imposed to achieve certain 
technical, policy, or other goals, such as accessibility and global scope, 
ease of evolution, re-usability of components, efficiency, and dynamic 
extensibility.  (In the design of the Web, there are also lesser design 
choices, like the names of the p and li elements in HTML, the choice of the 
colon (:) character in URIs, or grouping bits into eight-bit units 
(octets), are somewhat arbitrary; if paragraph had been chosen instead of p 
or asterisk (*) instead of colon, the large-scale result would, most 
likely, have been the same.)
]]

======

SECTION 2

...

I think that something like this should be stated clearly toward the start 
of section 2 (either at the end of 2.0, or as the beginning of 2.1):
[[
URIs are a cornerstone of Web architecture, providing identification that 
is common across the Web.
]]

(This may be obvious to us here, but it's sufficiently fundamental that it 
bears stating (even restarting) very clearly.  It seems to me this should 
be an unmistakable message of AWWW.)

...

Section 2.1, 1st para:

[[
... ([URI], currently being revised) ...
]]

I think the "currently being revised" is spurious, transitory and should be 
dropped.

...

Section 2.1, general:

I think the key message here doesn't really get stated until the final 
sentence: "... there are substantial costs to creating a new identification 
system that has the same properties as URIs".

The thrust of the Good Practice here, then, would seem to be that one 
should use URIs rather than other systems of identification:  this isn't 
immediately clear from the Good Practice statement.

I'd be inclined to *start* this section with the Good Practice statement.

...

Section 2.2, general:

As far as I can tell, the key message is stated early on in the Constraint 
"URIs identify a single resource" -- I think that is good.

The section then goes on to expand on this and a number of loosely related 
topics that don't clearly add to this.

Section 2.2.1 seems to be about the design of specific URI schemes, a topic 
which I think would be better dealt with elsewhere (e.g. in a TAG finding 
or URI-related specification).

Section 2.2.2 seems to be a further elaboration of the basic constraint, 
and I think should at least be nearby:   as presented, it seems to be a 
distinct matter.   It also seems to relate closely to the material in 
section 2.3.1.

I don't see the nub of what section 2.2.3 is trying to say, and feel it 
could be elided without any material loss;  if there's a fundamental issue 
here that needs to be stated in this architecture document, I think it 
needs to be stated more clearly.  (I think there are allusions here to 
techniques like "Reference by Description", but I think there are just 
that:  available techniques, not architectural constraints.)  If anything, 
this section seems to be blessing techniques like identifying people with 
their mailto: URIs, which I think is contradictory with the previous 
section and could be a source of confusion.

...

Section 2.2.2.1, para 3:

This paragraph gave me a really hard time.  I can only guess at what it's 
trying to say, so it's difficult to offer an alternative.  The most 
problematic clause for me was: "... over a set of URIs with a common prefix 
to one particular owner".

...

Section 2.2.2, final para:

"The section below ..." should be "The section above ...".

...

Section 2.3, general:

I feel there is a lot of duplication here with RFC2396bis [5], which has 
extensive discussion of URI comparison.  This seems like detail rather than 
architectural principle to me, and could reasonably be left for that 
specification to address.

[5] http://www.ietf.org/internet-drafts/draft-fielding-uri-rfc2396bis-06.txt
     http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html

...

Section 2.3, para 2:

[[
... For example, for "http" URIs, the authority component (the part after 
"//" and before the next "/") is defined to be case-insensitive.
...
]]

Is this really true or what is intended, for authority components that 
contain usernames and/or passwords?

Note that RFC2396bis refers to *host*, not authority for this:
[[
6.2.2.1 Case Normalization

When a URI scheme uses components of the generic syntax, it will also use 
the common syntax equivalence rules, namely that the scheme and ***host*** 
are case-insensitive and therefore should be normalized to
lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is equivalent to 
<http://www.example.com/>. Applications should not assume anything about 
the case sensitivity of other URI components, since that is dependent on 
the implementation used to handle a dereference.
]]
-- http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html#normalize-case
(My emphasis added as ***...***)

...

Section 2.3.1, para 1:

I don't know what is meant by "(the network neighborhood of the measured 
resource)" in this context.

...

Section 2.3.1, general:

Reading the 1st paragraph of this section, I could get the impression that 
the main reason that URI aliases are not a good thing is because they act 
to depress a resource's Google page ranking.  I think this obscures the 
deeper reason, which I take to be that it damages reachability of referring 
resources (i.e. the 2nd order relatiobships mentioned in para 2).

Suggest:  remove final sentence of 1st paragraph.

...

Section 2.4.1, 3rd bullet:

[[
* One should not expect that general-purpose software will do anything 
useful with URIs of this scheme beyond URI comparison; the network effect 
is lost.
]]

The assertion "the network effect is lost" and/or debatable.  A minimal 
change would be "the network effect is diminished", though I'd be more 
inclined to drop that qualification altogether.

...

Section 2.5, Good Practice point:

I found the point, as stated, seemed rather vague (specifically: "except as 
specified by relevant specifications").

My suggestion would be turn this around to state something like this:
[[
The form of URI may indicate how to access a resource, but not about the 
nature of the resource, except insofar as it is constrained by the access 
method.
]]

...

Section 2.6, Story:

I felt the example used in the story here was potentially confusing, as 
elsewhere this document suggests tommorrow's weather is a distinct resource 
and should be identified by a different URI (section 2.3.2). Also, the 
story states that the representation is XHTML, for which the representation 
of the secondary resource indicated by a fragment identifier specifically 
has a representation that is part opf the primary resource's representation.

This is an area where I feel that TimBLs original notes [3] are much 
clearer, albeit somewhat limited to the resource retrieval case.

[3] http://www.w3.org/DesignIssues/Model.html

Maybe a better example might be to talk about weather maps (without 
reference to a specific format), for which secondary resources might be 
isobaric vs isothermic vs comic-book sun-and-clouds representations (e.g. 
bit like the map tabs at http://www.bbc.co.uk/weather/ukweather/, but using 
fragment identifiers)?

...

Section 2.7

I felt that sub-section 2.7.1 didn't really say anything architectural, and 
suggest dropping this, promoting 2.7.1 to be the entire content of 2.7.

...

That's all I've time for right now.  I'll try and do some more later.

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Thursday, 9 September 2004 11:47:26 UTC