11/11 Arch doc review - miscellaneous comments from Tim Berners-Lee on 2003-11-14 (www-tag@w3.org from November 2003)

From: Tim Berners-Lee <timbl@w3.org>
Date: Fri, 14 Nov 2003 09:04:51 +0900
To: www-tag@w3.org
Message-Id: <2B14BA5A-1636-11D8-8D4F-000A9580D8C0@w3.org>
Here are my comments on a complete read-through of the architecture 
document as of 2003-11-11.

First of all, I must confess to a warm glow of satisfaction and pride 
in the group as I read through the stuff we have got in there which is 
really well hashed out and I think will be a great benefit.

So now to the comments.  Most of these are editorial.  In some cases I 
have written some text, a couple being new stuff we should maybe resist 
putting in, though they were in response to "the tag should" marks in 
the text.

Text to be inserted I have quoted in mail form, as I am using a mail 
writer, so thats what I got.  It does not indicate text from another 
message, just quoting my suggested text. Makes it blue for me.
____________________________

The very first paragraph of the Abstract encapsulates the fact that we 
haven't solved httpRange-14 yet.  It uses the word "resource" in two 
distinct ways.

1. Introduction, after the story, List element 1

- s/involves/is about/ (too vague)

- Diagram misrepresents "representation" as being the set of octets 
"<html>...</html>".  These are only the bits, there is metadata 
"content-type: text/xhtml+xml" or whatever, which should be I suggest 
in a box on top of the existing box.

- Editor's note "we may add other diagrams" seems silly.

1.2.3 Syntax and Interop'y

s/syntax, by specifying the content and sequence/the syntax, meaning 
and sequence/

(the content and sequence are not specified in the syntax)

2. Identification

s/linked-to within the information space/linked-to/

We are on tricky ground here without httpRange14.  Don't imply that the 
destinatoin of a link should be an information resource if you want it 
to be able o be a car.  Actually I don't like the use of "link" for all 
uses of URIs.  Just use as a reference is a good thing to talk about 
unless specifically talking about hypertext.

Later, in box "Princple: Assign URIs", remove word "identified" in a 
strange sense.  suggest:  "assign a URI to each resource to which it it 
intended that others may refer" or "assign a URI to each resource which 
others will refer to."

2.2 just before 2.3:

rewrite as

"URI ambiguity only arises if different parties use 
"http://www.example.com/moby" to identify different things"

I don't think we need to get into belief here.

2.3 URI schemes

There is a Note that the TAG should provide more justification for 
expanding by media type instead of  new URIs.   I agree.   I have had a 
go at it here:-

> HTTP is a powerful technology benefiting from many features and much 
> support, technical and social.  Technical support includes not only 
> client and server code, but also proxy and firewall systems, offline 
> caching, robots, search engines, and so on.  Features include 
> confidentiality (with SSL), authentication, etc., and hierarchical 
> delegation of authority. Socially, suppport is from the DNS system 
> management, and internal resource and access management within web 
> sites.
>
> The HTTP space is more than just a protocol.  It is an information 
> space supported by an evolving set of protocols.  HTTP has evolved and 
> will evolve again.
>
> Good Practice
>     Use HTTP when possible for new designs where possible instead of 
> designing a new space.
>
> To make a new scheme when HTTP could have been used
> - deprives the new system of the support mentioned above;
> - increases the code burden for systems (for example small portable 
> devices) which will end up having to implement both stacks.
> - could involve the community in an expensive rework of all the HTTP 
> system to date.
>
> If there is a perceived inadequacy in the HTTP features,(such as, say, 
>  or security, or domain name governance)  then it is generally better 
> to fix the feature in HTTP than to design the stack.
>
> To make a new scheme name (such as webcal:) when the protocol in use 
> is in fact HTTP,
> - prevents existing software from being able to use the URI when 
> otherwise it could;
> - prevents the new system from taking advantage of certain of the 
> features of http, such as local caching.
>
> Good Practice:
>
>     Where a system uses HTTP, it should also use the "http:"


I think this should go in, but probably not into a last call draft.

2.5 Fragment Identifiers

Remove "indirect" from the first non-story para of 2.5

"allows [indirect] identification of a secondry resource".

The word "indirect" has specifically been used for another case, where 
"indirect idenification" is by for example giving a unambiguous 
property of a thing, such as a person's SSN or email address.

2.6.2 Determination that two URIs identify

could we change "determination" to "expression", please?  We are 
talking here not so much about the ability to determine but the ability 
to express that two URIs are the same.

In the same section, change "equaivalentTo" to "sameAs".  The OWL vocab 
is now current, and this changed from DAML.

In the same para, change "state assert" to "directly state or 
indirectly imply"


2.6.3

I didn't notice this section go in.    Can we remove it?
I think DDDS is often harmful, as it is used as [much more complicated] 
way of reinventing HTTP and justifying the use of new URN schemes.
If we mention it, we should put a warning there.

3.2 Messages and Representations

rewrite first para as

A message is a communication event that is part of one of a 
non-exclusive set of messaging protcols (eg HTTP, FTP, NNTP, SMTP and 
SOAP).

(To say it is an event is true and important but misses the point that 
it is a communication.
To use "represented" by here was wrong - we use that word with a very 
specific meaning in this doc.
To omit SOAP suggests that SOAP messages are not messages, when they 
are. Important to show that messages at this level are messages too, 
even when conveyed on the back of lower level messages.)


In point 1 just below that:  s/Electronic data about resource 
state/Electronic data expressing resource state/


[end of my p17]

Section 3.4

There has been a nervousness around talking about ownership of URIs.  
We currently use the term "authority responsible for" and we are 
embarassed about it.  This partly comes from not writing it down - and 
partly from our (unfortunate) reluctance to discuss HTTP specifically. 
Here goes some text.



> Ownership of URLs.
>
> This term is used in the following sense.   URIs are minted as an 
> operation be an agent, when a given string for the first time is 
> associated with the given resource.   The requirement for URIs to  be 
> unambiguous demands that two agents do not mint the same URI for 
> different resources. The URI schemes assure this using different 
> techniques.
>
> a) The hierarchical delegation of authority allows ever smaller nested 
> parts of URI space to be assigned to parties. (Example: http, mailto)
> b) The generation of a fairly large random number reduces ambiguity to 
>  calculated small risk; (example: uuid)
> c) The generation of a URi as a checksum of a data object itself 
> (example: md5: )
>
> or a combination of more than one (eg mid:, cid:).  Whatever the 
> techniques used, except for the case (c), the agent ha a unique 
> relationship with the URI, which we can term "ownership".  The social 
> implications of this are not discussed here.
>
> The HTTP protocol gives the owner the power to serve representations 
> of resources, and the HTTP origin server is the URI owner's agent. The 
> concept of URI ownership, or "responsable authority" is particularly 
> visible in this case. It deos not apply at all in case (c).

3.5 Safe interactions.

Story ends ... "Neither data transmitted .. nor .. response ... 
corresponds to a resource named with a URI"

This is a bug.  We should admit this.   I don't know whether it is a 
footnote or a bit of text or a new issue.   Here is the way I would 
write it:

> Neither the POST request, which expresses Nadine's commitment,  nor 
> the response, which expresses the web site's acknowledgment and its 
> own commitment, can be referenced by URIs. This is a problem. Even 
> though in this case only two parties currently know the content of 
> these messages, the messages are an important part of the relationship 
> between them. It is a breakdown of the web architecture that they are 
> not given automatically a URI by which the parties can refer to them.  
> (Compare with mail messages which are given a message Id URI when they 
> are committed to.)
>
> Hence, while electronic commerce is done using HTTP POSTS, 
> accountability and reference relies on a web site generating a web 
> page to represent the transaction after it has occurred, and 
> suggesting that the user "print this and keep it".  The browser does 
> not in general keep a record of the POSTS made, even though an email 
> client keeps outgoing mails.  There is nothing to which a user can 
> point the website operators when tracing problem. The results of POSTS 
> are sometimes treated as web pages which cannot be revisited. Browsers 
> do not allow the user to manage the relationship between the form, the 
> posting (generally only meaningful in the context of the original 
> form) and the response (only meaningful in the context of the given 
> posting).
>
> Compare this to the superior accountability in email, when a request 
> can be copied to many people including public archives.


4 Data Formats

You note that "language", "data format" and "vocabulary" are used 
interchangeably.  I hope that "vocabulary" isn't.  I would say that 
some
data formats are languages, but a vocabulary is different.
As far as I understand the way we tend to use these words, here is my 
bash at explaining it in case it useful maybe for a glossary some day.

Data format

Constrained syntax for a series of bits, and an accompanying 
specification of how such series should be interpreted. Examples: PNG, 
Plain text, OFX, HTML, RDF, HTTP request, HTTP response

Language

Constrained syntax for a series of (normally) characters (normally 
encoded as a series of bits), and a specification of what such series 
mean. Examples; OFX, RDF, HTTP request, HTTP response.

(I don't see any use in belaboring the difference, mind you, except for 
connecting onto other people's ideas. Note that things in langauges 
have meaning, when data formats often are just presented to a user, who 
then determines any meaning in other ways. Also, languages are normally 
defined in terms of characters, so an encoding step exists between the 
data format and the language. XML is a data format as it specifies a 
bits as well as the characters.)

Vocabulary:

A set of terms which may be used for specific places in the grammar of 
a given language. Examples:     FOAF RDF ontology; SOAP HTTP headers.
RDF and HTTP headers define places where the grammar has an open set of 
terms which can be added to. These sets are vocabularies.


4.1 Just before 4.2

remove "very" ... we are making the point (elsewhere) that URI schemes 
are more expensive.

4.3
we say, "Two strategies are particularly useful".

Add after the 1 and 2 points,

"A powerful technique is for the language to allow either form of 
extension but dististinguish explicitly between them in the syntax."

This idea seemed to have got lost.

Later on in that section, in the Good Practice box,

s/logic/semantics/


4.5 Links

Please retitle "Hypertext links".

This section deals with global hypertext as a web application.

Further down, "A link is built from two pieces" is wrong - say "The URI 
referred to is built from two pieces".

In point 1 just there,  add text:

"..in which the link appears, and defaults to the URI of the referring 
resource, and ..."

In most cases the base if just the document URI, so not mentioning 
seems silly.

Two paras down, para starts "Section 5".
"; this is called resolving a URI reference".
Is it really?
In many cases, resolving means looking up.
I suggest strike this language, as I don't think (check)
that we use "resolve" in this way anywhere else in the paper.

Next non-ox para, startring "What agents".
This talks about active or passive and never defines either or uses the 
terms again.  Suggest delete the sentence "For instance ..passive". No 
loss to the document.

Also, delete the sentence "On the other hand ... control points" as it 
seems waffly, 	gets off the topic of hypertext, and doesn't seem to 
make sense to me.


Moving on to section 4.7.1
  This section needs some work. I think it makes a vague distinction 
between "shallow" and "deep, sophisticated" mechanisms, which might 
work for people at parties (avoid both ;-) but doesn't work here.

Rewrite section as follows:

> """Many modern data format specifications include mechanisms for 
> composition. For example,
>
> - It is possible to embed text comments in some image formats, such as 
> JPEG/JFIF. Although these comments are embedded in the containing 
> data, they have little or no effect on the content of the image.
>
> - There are container formats such as SOAP which fully expect to be 
> composed from multiple namespaces but which provide an overall 
> semantic relationship of message envelope and payload.
>
> - The RDF allows well-defined mixing of vocabularies, and allows text 
> and XML to be used as a data type values within a statement with 
> clearly defined semantics.
>
> These relationships can be mixed and nested arbitrarily. In principle, 
> a SOAP message can contain a JPEG image that contains an RDF comment 
> that references a vocabulary of terms for describing the image.
>
> Note however, that for general XML there is no model semantic model 
> which defines the interactions within XML documents with elements 
> and/or attributes from a variety of namespaces. How these namespaces 
> interact and what effect an element's namespace has on its ancestors, 
> siblings, and descendants must be defined application by 
> application."""

   5.  Conformance

Oooo.... we are defining conformance for people!  Resource owners, 
server managers and authors of specifciations.  Hmmm. i am an author of 
a spec... am I conformant?

If we could define the conformance of a spec, then this might be 
useful. But even then it is trying to be rigid about a very subjective 
thing.  I don't see us giving AAA star ratings for specs.  But maybe we 
should look at it.  So much of the material in our document is not in a 
MUST box that really all kinds of specs could be produced and claim 
conformance. Discuss. Over dinner.

Tim
Attachments

text/enriched attachment: stored
Received on Friday, 14 November 2003 18:58:07 UTC