Random thoughts on arch doc read-through. from Tim Berners-Lee on 2004-08-05 (www-tag@w3.org from August 2004)

From: Tim Berners-Lee <timbl@w3.org>
Date: Thu, 5 Aug 2004 14:42:06 -0400
To: 'www-tag@w3.org' <www-tag@w3.org>
Cc: Tim Berners-Lee <timbl@w3.org>
Message-Id: <268341BE-E70F-11D8-B1F7-000A9580D8C0@w3.org>
I did a read-through in preparation  for the face-face meeting next 
week.
Not all of these have impact on the document.
Some place I have suggested changes but we may want to suppress them 
for the sake of stability.

(On Error handling - this is tricky  I don't think we have  a 
consistent principle here, but some good stories.  Sometimes I think a 
set of anecdotes would be a great accompinment to the book. I am happy 
with the document here, maybe a finding or a wiki to collect stories 
would be good background.
I think that one cannot generalize about error handling in general at 
all, any more than one can generaize about what an application should 
do in general.  There are just some calssic mistakes to avoid. 
Antipatterns...)

2 .. principle: Global Identifiers
Global naming leads to global network effects."
Note that this principle is, in a way, counter to "GOTO considered 
harmful".  It is "GOTO considered essential".  The first is "A good 
system is nicely tree structured", and the second is "A good system is 
nicely freely graph structured." The first is top-down design.  The 
second is grass-roots movement. It mightbe worth noting the contrast -- 
and the fact that each pattern is good in the right place. Top down is 
good when you want and have and can manage total control.
Suggest add something in a separate paragraph to the effect:-

"The provision of globally scoped identifeirs fro everything of 
importance allowed free web-like linking. This is counter to the 
software engineering principle of top-down design, and tree structure, 
in which indiscriminate "Goto" is considered harmful [ref GCH].  Tree 
structured designs provide central control, whereas graph structured 
designs allow unfettered distributed growth. The WWW is an example of 
the latter, but may parts of its design, such as XML documents and the 
DNS system, are tree structured."

Before 2.3:
"	• 	Are there resources that are not identified by any URI? In a 
system where the only resource identification mechanism is the URI, the 
question is only of philosophical interest (similarly, if a tree falls 
in the forest and nobody is around to hear it, does it make a sound?). 
The advent of other resource identification mechanisms may change the 
nature of this question and answer."

Actually, this is not a "tree falls in the forest" questiion.  That is 
different.

It is a question of nomencature: what are we defining "resource" to 
mean?
We could try to define it as "everything which has a URI" but that is 
immeasurable and useless, and that is where the useless philosophical 
discussion comes in.  The only useful definition is teh set of things 
which *could* have URIs, which is the set of things.  As this includes 
the set of real numbers, which is not countable, then there clearly are 
resources which do not have URIs.
Currently, the document reads """Are there resources that are not 
identified by any URI? In a system where the only resource 
identification mechanism is the URI, the question is only of 
philosophical interest""".
This is wrong. Suggest replace this with,
"""Are there resources that are not identified by any URI? Yes. 
(Strictly, resources includes all things which could be given a URI, 
including real numbers, which are not countable, whereas the number of 
URIs is countable.)""".
Or remove the para.


2.3 para 2
"""To reduce the risk of a false negative (i.e., an incorrect 
conclusion that two URIs do not refer to the same resource) or a false 
positive (i.e., an incorrect conclusion that two URIs do refer to the 
same resource), certain specifications license applications to apply 
tests in addition to character-by-character comparison. """
This is confusing, introducing a distinction between false positives 
and negatives and not using it.

Suggest: Remove "or a false positive (i.e., an incorrect conclusion 
that two URIs do refer to the same resource)".

Add paragraph break before "Agents that reach conclusions" which is a 
different point, and is about false positives.


2.3, after "Good practice: Consitent URI usage"
It might be as well to add a comment that when generating URIs, agents 
SHOULD generate them in a canonical form where it exists. In orther 
words, in schemes and/or applications give equivalence relations and 
corresponding canonicalizetion functions (such as case-insensitivity 
and lower-casing respectively for DNS names, then the canonical form 
should be generated.

This is a conservative/liberal sort of case.  We haven't defined a 
protocol that one should canoniclze on generation or or reception so we 
are specifying a sort of in-between overlap - canonicalize on 
generation, but don't futz with non-canonicaloized things afterward as 
others might not know about canonicalization.  This is a hairy way to 
work and it should be clear what the design is.

Suggest add:

Good Practice:  Generate URIs in canonical form
Where equivalnce relations exist between different URIs (such as case 
insentivity) and canonicalization functions exist (such as lowercase) 
URIs should be generated in canonical form.



Two paras before 2.7:
s/where the current document/where this document/


3.4 para after bulleted list:
"On the other hand, there is no inconsistency in serving HTML content 
with the media type "text/plain", for example, as this combination is 
licensed by specification. "

Ummm.  What does this mean?  What specification?  text/plain 
specification presumably.  I think that sentence could easily be 
misinterpreted as saying you can serve HTML as text/plain and expect it 
to be a hypertext page.  Suggest change to:
"On the other hand, there is no inconsistency in serving HTML content 
with the media type "text/plain", for example, when the resource is the 
plain text source correspnding to an HTML document, to be presented as 
raw source."  OWTTE.

4. Data formats
"Thus, before inventing a new data format (or "meta" format such as 
XML), designers should carefully consider re-using one that is already 
available."
Have we anywhere compared this with the creation of a new scheme?  It 
is typically less costly to make a new data format than a new scheme, 
because of content negotiation and often more advanced software 
flexibility.


(@@ - W3C namespace policy -- follow link and check.  >1 policies?)


4.5.3.  " XML namespaces reduce the risk of name collisions"
s/reduce/remove/
(unless one is being *very* picky!)

4.5.8 "Many people assume that the fragment identifier #abc, when 
referring to XML data, identifies the element in the document with the 
ID "abc". However, there is no normative support for this assumption."
And we have a counterexample.
Add: "Furthermore, for RDF/XML -based languages this is not the case."
Received on Thursday, 5 August 2004 15:14:50 UTC