Notes Bjoern Hoehrmann's comments on the draft finding for metadataInURI-31 from noah_mendelsohn@us.ibm.com on 2006-08-15 (www-tag@w3.org from August 2006)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 15 Aug 2006 12:32:13 -0400
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: www-tag@w3.org
Message-ID: <OFDA3F2E5C.D2E19737-ON852571C7.0079756F-852571CB.005AD837@lotus.com>
Bjoern Hoehrmann wrote on June 30 a note  [1] that was nominally about XBL 
namespaces, but that in fact conveyed a lot of concerns regarding the 
draft finding on Metadata in URIs [2].  So, I'm taking it as I would any 
other input on a draft finding, and am attempting to respond here in 
detail.  The comments in fact arrived just as the TAG was voting to move 
the draft to full finding status.  I sent a rather tentative and 
incomplete response around that time at [3], and Bjoern responded to that 
at [4].  Accordingly, I told the TAG that we should hold off on the final 
finding until I had a chance to work through Bjoern's response in more 
detail.    This is my attempt to do that. 

I'm trying to strike a balance here.  On the one hand, I want to be 
responsive where there are important concerns.  On the other, we always 
have to pick a point where the TAG will say "publish", and further 
comments can be considered as input to possible revisions.  So, I've tried 
to respond to Bjoern's comments with some detail and care, but given their 
late arrival I am setting the bar a little higher than I might normally in 
being open to significant redraft of the findings.  I hope the following 
strikes a reasonable balance.

The following quotes are from Bjoern's notes [1,3] followed by my 
comments.

> * noah_mendelsohn@us.ibm.com wrote:
> >Which begs the questions:  what sorts of information should bein a URI, 

> >and who should or shouldn't depend on it being there?  This 
> might be the 
> >time to remind everyone that the TAG has been working on just those 
> >questions under the banner metadataInURI-31 [1].  We have a 
> draft finding 
> >[2], which as of the last TAG F2F is quite close to final.  If this 
> >discussion is going to turn to what should or shouldn't be 
> encoded in the 
> >text of a URI, I suggest giving the draft a look first.  Thanks!
> 
> >[1] http://www.w3.org/2001/tag/issues.html#metadatainURI-31
> >[2] http://www.w3.org/2001/tag/doc/metaDataInURI-31
> 
> Let's see. The resource locator of the resource is poorly chosen,

This finding is following the same naming policy for drafts as the TAG has 
used for other findings. While I can see that sensitivities are raised 
particularly for a finding on metadata in URIs, I personally think the 
name is fine (if not ideal), and propose not to tackle changes to the W3C 
naming policy in conjunction with publication of this particular finding.

> the document then incorrectly claims to make correct use of RFC2119 
terms

>From Bjoern's 2nd response at [4]

> TAG findings are not normative documents and do not specify con-
> formance; the keywords are to specify requirement levels. The
> concept of having requirements in non-normative documents that
> cannot be complied with does not make sense to me. The specific
> use of "MUST NOT" is highly suspicious, it's entirely unclear
> from the text how interoperability is at stake, or which harm
> is done by ignoring it.

>From RFC 2119:

"In many standards track documents several words are used to signify the 
requirements in the specification.  These words are often capitalized. 
This document defines these words as they should be
   interpreted in IETF documents.  Authors who follow these guidelines 
should incorporate this phrase near the beginning of their document:

"The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this 
document are to be interpreted as described in RFC 2119.

"Note that the force of these words is modified by the requirement level 
of the document in which they are used."

I don't see that as forbidding references from documents which are 
non-normative.  In fact, if the scope is limited at all, it's specifically 
to IETF documents, and I think the precedent is well established for 
referencing RFC 2119 from non-IETF specifications.  Furthermore, while TAG 
findings are not Recommendations, they are in some sense normative. 
Continuing from RFC 2119:

"1. MUST   This word, or the terms "REQUIRED" or "SHALL", mean that the 
definition is an absolute requirement of the specification."

I'd say that's true of the use in the draft finding.

"6. Guidance in the use of these Imperatives

"Imperatives of the type defined in this memo must be used with care and 
sparingly.  In particular, they MUST only be used where it is actually 
required for interoperation or to limit behavior which has
   potential for causing harm (e.g., limiting retransmisssions)  For 
example, they must not be used to try to impose a particular method on 
implementors where the method is not required for
   interoperability."

Well, I think that's true here.  The draft finding says:

"Constraint: Web software MUST NOT depend on the correctness of metadata 
inferred from a URI, except as licensed by applicable standards and 
specifications."

I think that's pretty essential to preserving interoperability of the Web 
and eliminating harm.

> The first good practise is rather odd; it's introduced by claiming that
> the scheme component of a resource identifier is metadata "peeked" from
> the resource identifier, and attempts to suggest making, say, a browser
> that supports only HTTP and therefore only HTTP resource locators is
> somehow bad (which it is not). It then repeats justifaction for the 
> constraint discussed above. Frankly, I have no idea what's this trying
> to say.

I suspect this is referring to what was section "2.2 Avoid depending on 
metadata".  That entire section has been removed, so I'm guessing that 
concern is resolved.

> The next good practise is obvious again ("Don't do something unless
> willing to accept the consequences") though I like how it's introduced
> by "Web users act on guesses about URIs all the time"; this happens e.g.
> if I call you and ask you to go to "http://example.org/weather/Boston"
> and tell me what's on that page. If you then infer I might be wondering
> about the weather in Boston, you are acting contrary to "Agents making
> use of URIs SHOULD NOT attempt to infer properties of the referenced
> resource."

I agree that it borders on the stupidly obvious, but I didn't come up with 
a better way to make the point that you may guess, but there are 
downsides, and the risk is yours.  Since the TAG at its face to face 
approved this formulation, I'm inclined to leave it, but I take the point. 
 We'll be reviewing this response with the rest of the TAG, and I'll 
highlight this as an area of concern (Stuart Williams raised it too.)  If 
we can come up with a better way to make the point, I'll do it.

> Section 2.4 is a bit funny. I am unaware of where the HTML spec makes
> any mention of what is inferred, 

It's only indirectly the HTML spec.  The sequence is:  the resource 
authority sends to a client computer (let's call it my computer) an HTML 
form.  On the form is code that renders a button that, when pressed, 
assembles a URI that is built in a structured way to include data entered 
from fields on the form.  In that sense, the resource authority has 
invited me to submit URIs of that form.  Insofar as the form itself 
includes documentation that warrants what those URIs are for, and the form 
comes from the resource authority, then the URI has effectively documented 
the intended use of those URIs.  Crucially, the server at the authority 
can't tell whether I construct those URIs by actually doing the obvious 
thing and presenting the form in a browser, or by some other means.  I 
don't believe that HTML forms have a timeout mechanism that says "don't 
hold this form for a few weeks and then fill it in", so I can equally 
write some other software to send those same URIs a few weeks later (cool 
URIS don't change).  I agree the HTML spec doesn't tell the story in quite 
those terms, but the deployment of the form by the resource authority has 
exactly that implication I think.

> and "The same HTML Form is also a
> computer program, executable by the browser, ..." well, that's sure a
> good one for the .signature file.

I'm afraid the above comment doesn't at all convince me that what I wrote 
is even slightly wrong.  Is it a computer program?  I'd say so.  It's not 
in a Turing-complete lanuage, but it surely instructs a browser to render 
certain UI, prompt for certain things, and if a given button is pushed to 
submit another Web transaction.  Sounds like a program to me.   I admit 
its not a program in a general purpose language. 

> Section 2.5 apparently contradicts "Avoid software dependencies on
> metadata in URIs." in that it suggests "Even if it does not document
> this policy publicly, example.org's own Web servers can safely depend
> on it" implying it's okay to do that.

First of all, that "avoid dependencies" section is indeed gone.  Secondly, 
I think we acknowledge at the top: 

"As these examples show, encoding or not encoding metadata in a URI or 
deciding whether to rely on such metadata is often a tradeoff, involving 
some benefits and some costs. In such cases, choices should be made that 
best meet the needs of particular resource providers and users. "

That's exactly the case here.  The authority and its server managers may 
benefit from a regular assignment of URIs and from writing software, for 
use within their authority, that benefits from their assignment rules. 
True.  That software will be less general than other software, and in that 
respect less valuable.  Also true.  A tradeoff.  The important distinction 
is that someone running similar software without reliable knowledge of the 
assignment rules is breaking the rules:  they are creating an expectation 
that the authority will use URIs in a certain way, when in fact it's the 
authority that gets to make that decision.

> I am unsure what good practise 
> is established here, the text explains an option, it does not make
> any suggestion.

It goes a little further:  it says "Good Practice:  URI assignment 
authorities and the Web servers deployed for them may benefit from an 
orderly mapping from resource metadata into URIs."   I think that's true 
and I think it's stated quite clearly.

> Section 2.6 is paradox, http://example.org/123Hx67v4gZ5234Bq5rZ is
> obviously not intended for direct use by people and therefore the
> good practise does not apply.

I understand your point, I just don't think the section as drafted suffers 
from the problem that you imply it does.  To make a point clearly, it 
effectively says:  "If the authority were dumb enough to think that a user 
would easily remember and type "http://example.org/123Hx67v4gZ5234Bq5rZ" 
they would obviously be sadly mistaken.  Having established that fact...

> To make it meaningful you would have
> to root the good practise as usability concern and base it upon user
> goals; in other words, URIs that people want to make direct use of
> are to be made usable by people (which again is obvious).

Well, I agree with one important exception:  it's often the case that the 
authority decides which URIs it intends to be conveniently usable by 
users, and which it expects will be used internally to databases, web page 
links, etc.  The finding says:

"Although Web architecture does not require that URIs be easy to 
understand or suggestive of the resource named, it's handy if those 
intended for direct use by people are.

"Good Practice: URIs intended for direct use by people should be easy to 
understand, and should be suggestive of the resource actually named."

I think you're saying: the end users should decide which URIs they want 
convenient, and somehow that turns into the assignment authority having an 
obligation to them.  Of course, many organizations will have commercial or 
other reasons for wanting to make their users happy, but I think it's 
ultimately up to the authority to decide how to balance convenience for 
end users of its resources with other factors affecting URI allocation.  I 
think the finding is good as it stands.

> It seems http://www.w3.org/2001/tag/doc/metaDataInURI-31 is intended
> to be used directly by people, yet it is not easy to understand (why
> "2001"? Why "doc"? Why "31"?) and consequently not suggestive of the
> resource actually named (too much noise to determine the signal).

Well, the choice of this assignment pattern was made years before this 
finding was named, but I don't by your premise:  I don't think this is to 
be regularly used by people.  Almost everyone I know who uses this, 
including me (and I use it a lot), either clicks on it, copies or pastes 
it from the clipboard, finds it on the TAG's list of findings, clicks on 
links to it in emails etc.  I think this is a simple disagreement between 
you and those who administer the W3C site as to just how easy they want 
these finding IDs to be to type, and perhaps whether you want what appears 
to be a date (2001) to be suggestive of some significant event in the 
authorship of this finding. 

> The good practise in 2.7 is implied by the one in 2.6, it is harmful
> if interpreted incorrectly, and poorly extroduced (resource locators
> locate resources, they do not convey metadata about a resource as 
> claimed in the extroduction).

I didn't say they convey it.  I said that at some authorities there is an 
internally established policy of >synchronizing< metadata with the 
assignment of the URI.  That's not necessarily about conveying it to 
anyone outside.  If you establish such a policy, and if that metadata 
changes, you're going to face a difficult choice:  change the URI for an 
existing resource, or not observe your own policy.  I think that's a fact. 
 I think it's only indirectly related to "conveying" metadata in the URI.

> I also note that there is little if any
> consensus about this; the principle builds upon the assumption that
> it's a bad idea to change resource locators as the resource changes;

Yes.  Cool URIs don't change.

> you can equally well say that resources should never be changed,

Um, well, then it makes it really difficult to have links to anything like 
a clock, as the name of the clock would change every time it ticked. 
That's an extreme case, but the same is true for things like news 
articles.  Sometimes you want a link that refers to exactly the 4PM 
revision, and sometimes you want a link that gets you the live version. If 
you put the author's name in that link, and suddenly a new author adds a 
paragraph at 5PM, you've got a problem.  See also the evolving work on TAG 
issue Generic Resources [5] and the early draft's of a finding that Raman 
is working on [6].  This work is exploring the balance between having 
stable URIs and URIs that change when there are differing or time varying 
representations of what otherwise feels like the same resource.

> It seems the References section violates the "Consistent URI usage"
> good practise and the document its own "Resource metadata that will
> change SHOULD NOT be encoded in a URI."

Sorry, I'm not getting this comment.

> "URIs" have been obsoleted many, many years ago, only a few 
> confused people want them to stay.

Huh?  Sorry, I'm again not getting this.

> So in conclusion I am unsure why we should look at the draft?

Well, at the risk of stating the obvious, there's a lot of confusion about 
when to put metadata into URIs and when to depend on it.  While I'm sorry 
that you're obviously not enthusiastic about a lot that's in this, I would 
have thought that "why to look at it" wasn't in question.

> There is extraordinarily broad agreement that W3C's so-called 
> "namespace policy" makes no sense whatsoever, I don't think 
> there is anything to be discussed here really. Many acceptable schemes

This makes me nervous that you're mixing two things that are only 
tangentially related:  the merits of this finding, vs. the W3C's 
particular assignment policy for URIs such as 
http://www.w3.org/2001/tag/doc/amazingFinding.html.  While one would 
certainly hope that W3C's choices would be broadly consistent with the 
advice in the finding, I think this discussion needs to be about the 
finding itself.

In conclusion:  I hope the above represents at least a careful look at the 
issues raised.  There are several which I believe are already resolved, or 
which I propose (as noted above) to run by the rest of the TAG.  I will 
also be working through a related set of comments from Stuart Williams. 
While I certainly wouldn't look for final concurrence until a revised 
draft is complete, I hope this is an acceptable path forward.  Bjoern: 
thank you again for your care in commenting on the draft.

Noah

[1] http://lists.w3.org/Archives/Public/www-tag/2006Jun/0152.html
[2] http://www.w3.org/2001/tag/doc/metaDataInURI-31-20060609.html
[3] http://lists.w3.org/Archives/Public/www-tag/2006Jun/0156.html
[4] http://lists.w3.org/Archives/Public/www-tag/2006Jun/0157.html
[5] http://www.w3.org/2001/tag/issues.html?type=1#genericResources-53
[6] http://www.w3.org/2001/tag/doc/alternatives-discovery.html

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Tuesday, 15 August 2006 16:32:26 UTC