RE: New Editors Draft of the httpRange-14 Finding from Booth, David (HP Software - Boston) on 2007-08-23 (www-tag@w3.org from August 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Thu, 23 Aug 2007 00:04:16 -0400
To: "Rhys Lewis" <rhys@volantis.com>, "www-tag" <www-tag@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C20314F6BA@tayexc19.americas.cpqcorp.net>
Here are some comments on the draft finding of httpRange-14 at
http://www.w3.org/2001/tag/doc/httpRange-14/2007-08-31/HttpRange-14.html

First, I am very glad that the TAG is working to clarify this finding.
It is much needed.  The initial finding left many questions unanswered.

HOWEVER, there is a major problem with this finding as it currently
stands: the finding critically depends on the definition of "information
resource", but the definition that currently stands in WebArch[1] is
fatally flawed.  This finding tries hard to dance around this flawed
definition without outright contradicting it, but it really is not
possible.  The original definition is just plain WRONG and needs to be
corrected.

Therefore, before trying to complete the httpRange-14 finding document,
I suggest that the TAG issue an erratum to WebArch[1] saying something
like:
[[
The definition of "information resource" erroneously covered only the
case in which the resource involves static information.  It should have
also covered cases in which information is dynamically generated or
consumed.  A corrected definition of "information resource" would be . .
. etc.
]]

BTW, here are a two more examples of information resources whose essence
is NOT information --  i.e., their essence cannot be transmitted in a
message -- nor are they really about dynamically generating
representations:

	- A site for anonymous crime tips.  The site *consumes*
information, but all it ever sends back is a 200 reponse saying "Thank
you".  

	- A confessional site.  Again, the site *consumes* information,
but all it ever sends back is a 200 reponse saying "You are forgiven".

In these examples, the key point is that they are network information
*sinks*.

                                  ----------

Aside from that major comment, I started to write up some minor
comments:

1. Sec. 1.1:
[[
A URI uniquely identifies a resource. However, on its own, a URI is not
enough to provide a resource with a presence on the Web. For a resouce
to have a Web presence, there must be a means by which it can be
accessed.
]]
This seems to imply that you are only talking about information
resources, since AFAIK non-information resources cannot be accessed on
the Web.  I do not think it even makes sense to talk about a
non-information resource having "a Web presence".  The URI declaration
may indeed have a Web presence, but not the resource itself.

I suggest rewording this section to permit it to apply to both
information resources and non-information resources.

2. In the story about Peter's guitar site, a Good Practice note says:
[[
Authorities SHOULD NOT make misleading assertions about the Web presence
of any resource, whether or not they own its URI.
]]
I think the lesson here is a bit ambiguous.  If a URI is sometimes 404,
or if it is 404 currently but might not be 404 later, does that mean one
should not use that URI?  This does not seem reasonable.  I think some
more usual interpretations of a link would be:
 - the URI is *intended* to be dereferenceable;
 - the URI is *hopefully* dereferenceable;
 - the URI may become dereferenceable; or
 - the URI was dereferenceable when I tried it.  

In short, page author who references a URI is not responsible for the
sins of the URI owner.  (In the example, Peter was both the owner and
the user, which is part of why this example is ambiguous.)

On the other hand, if a URI is never expected to be dereferenceable,
then certainly one should not suggest that it may be.

3. Section 2.  Big alarm bells started ringing in my head when I read
the title of section 2: "Resources Whose Essence is Information".  This
is badly perpetuating the fatally flawed definition of "information
resource" that is currently in the WebArch document.

I started skimming at that point.

3. In sec 4.2 a "Good Practice" says:
[[
A URI owner providing an information resource associated with a
non-information resource SHOULD avoid the need for additional
redirection operations after the original 'See Other' response. In
particular, the URI returned in the 'See Other' response SHOULD be able
to provide representations of the associated information resource.
]]
I do not see a justification for including this advice.  For one thing,
the WebArch already says that information resources should provide
representations, so this advice is redundant.  For another thing, it
seems to discourage the use of further indirection.  But people may have
good reasons for using indirection.  Indirection is not discouraged for
other information resources.  Why is it for this case?

4. Section 4 on HTTP Response codes.  I think it is essential that this
finding specify the inferences that can be made for every response code,
so I am glad that there is an attempt to do that.  However, I think
sections 4.3.1 - 4.3.3 are dangerous because they redefine the meaning
of the response codes.  It is much better to quote the HTTP spec
verbatim if necessary, but NOT attempt to further clarify it.  The table
in 4.3.4 is far better, because it merely states what inference can be
made from each return code.

5. Regarding good practice "Where a relatively small set of closely
associated non-information resources is involved, associations with
related information resources SHOULD be indicated using the secondary
resource approach.".  I totally disagree with this recommendation.  A
key shortcoming of the hash approach is that the meaning of the URI
depends on the media type returned.  Architecturally, it makes no sense
to tie the meaning of the URI to the returned media type!  They should
be completely independent.  By creating this dependency, you limit your
future options about the representations that you can serve.
Furthermore, since hash URIs 

6. "Where these associations involve the use of RDF or OWL, the natural
approach is usually to use secondary resources."  That's quite a biased
statement.  I think it's better to just state the trade-offs between the
hash versus slash approaches and let the reader make the choice.

7. Finally, this finding really should discuss the concept of a URI
declaration[2], because that concept is central to the idea of using
URIs for non-information resources.  In short, it helps clarify what
information should be served in conjunction with a non-information
resource.


References
1. WebArch: http://www.w3.org/TR/webarch/#def-information-resource
2. URI declarations: http://dbooth.org/2007/uri-decl/


David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent
the official views of HP unless explicitly stated otherwise.
Received on Thursday, 23 August 2007 05:22:58 UTC