Proposed disposition of Stuart Williams' comments on Metadata in URI 31

At its June 2006 F2F meeting in Amherst, MA, the TAG voted "to accept 
http://www.w3.org/2001/tag/doc/metaDataInURI-31-20060609 contingent on 
Noah finishing his TODO list to the satisfaction of Ed." [1]  Before I 
could wrap up the finding, I received two sets of additional comments, and 
I informed the TAG that I would delay publication until I had reviewed and 
suggested dispositions for that new input.  In August, I summarized my 
proposed responses to the comments received from Bjoern Hoehrmann [2]. The 
purpose of the this note is to describe the proposed dispositions for the 
comments received from Stuart Williams [3,4].

As in my responses to Bjoern, I'm trying to strike a balance.  On the one 
hand, I want to be responsive where there are important concerns.  On the 
other, we always have to pick a point here the TAG will say "publish", and 
further comments can be considered as input to possible revisions.  So, 
I've tried to respond to Stuart's comments with some detail and care, but 
given their late arrival I am setting the bar a little higher than I might 
normally in being open to significant redraft of the findings.  I hope the 
following strikes a reasonable balance.

The following quotes are from Stuart's comments (the notes in [4]) 
followed by my comments.  In a moment I will send another note announcing 
the posting of a new draft of the finding.  Any changes proposed below are 
in that draft (which in fact is already on the Web at the usual URIs.) 

Stuart's comment:

> The concept of authority wrt to URI is one 
> which some have pushed back against. They have argued that the 
> URI scheme itself is what states
> what a given URI identifies. Generally this is presented as an 
> operationalised notion of what it means to ‘identify’ a 
> resource. This view would likely also argue
> that RFC2616 ‘creates’ all possible HTTP URIs.

The term is used at least occasionally in the WWW Architecture document, 
and in some sense it's locally defined in the metaDataInURI-31 draft:

"The authority that creates a URI is responsible for assuring that it is 
associated with the intended resource, "

I don't recall any other TAG members raising this in earlier reviews.  If 
other members of the TAG think it's worth the effort to come up with a 
different approach, I'd be willing, but my vote is to leave this as is.  I 
do see your point, but I'm just not convinced there's a problem.

-------

Commenting on:

> "Many URI schemes offer a flexible structure that can also be used 
> to carry additional information, called metadata, about the resource."

Stuart's comment:

> Do you have an example of such a scheme.
> I can’t think of any!!!

Sure, the http scheme for example.  I can encode into URIs in that scheme 
creation dates, directory hierarchies, file types, and all sorts of 
things.  It doesn't provide a standard representation for any one of 
those, but that's not the point:  it's a schema that "can be used" to 
carry such information.  Indeed, the subject of the finding is when it 
should be used in that way, and when consumers of URIs should depend on it 
having been used that way.

-------

Commenting on:

> "The first question is focused on people and software acting in the 
> role of or on behalf of a URI assignment authority (authorities) for
> URI assignments within the scope of that authority. The other 
> questions are focused on people and software making use of URIs 
> assigned outside of their own authority (observers).

Stuart's comment:

> Whilst I’m conscious that this is either text that I wrote or 
> similar, it is again couched in terms of authority,  which I 
> know some rejects. That said I think that there may be a 
> crossing of layers here in that an operationalised view of what
> a  given URI identifies has nothing to say about what a 
> resource signifies.

As I said above, I'm OK with speaking of authorities.  On your 2nd point, 
I don't see the finding text speaking in those terms, but even if it did, 
I think there is a connection, insofar as the definition of a URI schema 
creates the (means by which an authority expresses an) association between 
any particular URI and a resource.  If the "operational" results aren't 
consistent with or reflective of that, then I would say the system is 
misconfigured. For example, if I have in hand an https URI, and I 
dereference it over a network that someone has misconfigured to ignore all 
the integrity guarantees implied for the association of an https URI with 
its resource, I may operationally get a result that is not really for the 
resource.  That's true, but it's because the system is not configured in a 
manner that reflects the requirements of the URI scheme that it's 
supporting.  When things work right, I think the operational results are 
reflective of the underlying resource, at least in whatever sense that the 
URI scheme establishes such an association.

I do have some concern with that paragraph, but mainly editorially.  I 
think it's a but clunky, but it seemed to have been in the finding since 
before I was involved, and since it was saying things with which I 
basically agree, I left it. 

Proposed resolution:  To deal with the clunkiness, I have reworded as: 
"The first question is primarily of concern to URI assignment authorities, 
who must choose a suitable URI for each resource that they control. The 
other questions are focused on people and software making use of URIs, 
whether at the resource authority or elsewhere.   Of course the questions 
are related, insofar is one reason for an authority to encode metadata is 
for the benefit of resource users."

Stuart's comment:

> FWIW IIRC Roy on the other hand supported the notion of 
> delegated authority passed on downward from the URI spec to 
> scheme specs, to ‘owners’ of DNS names and so forth.

I'm comfortable with Roy's position.

-------

Commenting on:

> In this example, there is no normative specification that provides for
> determination of a media-type from URI suffixes, and the assignment 
authority
> has provided no documentation to license an inference of media-type from 
the
> URI. Martin's browser is in error, because it relies on URI metadata
> that is not
> covered by normative specifications and has not been documented by the
> assignment authority. A correctly written browser would have shown the 
faulty
> XML as text, or might conceivably have shown a warning about the 
apparent
> mismatch between the type inferred from the URI and the returned 
Content-
> Type. (Martin's browser is also ignoring TAG finding "Authoritative 
Metadata"
> [AUTHMETA], which mandates that the Content-Type HTTP header takes
> precedence even if type information had somehow been reliably encoded in
> the URI.)

Stuart's comment:

> Comment [skw4]: It is in error because it construes that there is 
> metadata intentionally placed in the URI when there is not.

Hmm.  You seem to be saying that we know conclusively that there is no 
metadata in that URI, and I don't think that's the case.  In fact, there 
may well have been metadata, even in the .xml suffix in question.  The 
authority may have decided to use .xml as a suffix for anything that was 
originally intended as xml, and in this case has extended that convention 
to some buggy XML that is in fact not well formed.  I think the draft on 
the finding is correct as it stands:  there may or may not be metadata in 
the URI, but the point is we can't know whether it's there or how to 
interpret it unless there are normative specifications or documentation 
from the assignment authority.  I'm afraid I'm not convinced on this one.

-------

Stuart's comment:

> typo: reaons -> reasons

Fixed, thank you!

-------

Commenting on:

> There is certain metadata that Martin or his browser can reliably 
determine
> from the URI. For example, the URI conveys that the http scheme has been
> used, and that attempts to access the resource should be directed to the 
IP
> address returned from the DNS resolution of the string "example.org". 
These
> conclusions are licensed by normative specifications such as [URI] and
> [HTTP].

Stuart's comment:

> Comment [skw5]: Hmmmm I
> have always found this tricky. Wrt
> to say FTP URI scheme, the
> scheme tells you (in an operational
> style) what resource is identified –
> it is the resource that would
> provide the resulting
> representation *if* you did a
> particular bunch of things. The
> HTTP spec is the same. However,
> neither is a statement about HOW
> the resource should be accessed,
> only a statement of WHAT
> resource is identified. Ok. Yes,
> typically HTTP: would imply that
> access using http ought to be
> possible.

I've found it tricky too, witness my so far unsuccessful attempts to tell 
just this story in the drafts on schemeProtocols.  The question here is: 
does the paragraph as quoted above need fixing?   I certainly think it's 
right that "the http scheme has been used", as that's covered by normative 
specs.   I'm a little less clear on whether I've quite correctly told the 
story in saying "that attempts to access the resource should be directed 
to the IP address returned from the DNS resolution of the string 
"example.org". These conclusions are licensed by normative specifications 
such as [URI] and [HTTP]." 

I'll ask other TAG members for their opinions, though I really don't want 
to back into the whole schemeProtocols discussion.  If necessary, I'll 
delete the offending parts of that paragraph.  Unless other TAG members 
agree there's a problem, I propose to leave it.


-------

Commenting on:

> Good Practice: Avoid software dependencies on metadata in URIs.

Stuart's comment:

> Comment [skw6]: The tone of
> this seems to me to have a
> presumption that metadata *is*
> embedded in URIs, as opposed to
> “in some cases there happens to be
> metadata embedded in URIs”.

The section in which this suggestion was made has been dropped.

> I find myself not wanting to allow
> that the things being cited here as
> metadata are infact metadata. I see
> them mostly as ‘distinguishing’
> characteristics which have been
> encoded into URIs 

That seems like metadata to me, except in the case where the information 
in the URI happens to duplicate what's in the content, in which case it's 
arguably "data" not "metadata". 

> principally for
> the purpose of generating unique,
> transcribable URIs, rather than
> with the intent that metadata be
> recoverable from the URI.

I'm not convinced that the motivations of the authority are what's 
important.  It's often there.  When it is, or when it appears to be there 
(see sections on guessing), it's tempting for clients to rely on it.  This 
GPN is saying:  especially in software, don't do that.

Anyway, as noted above, the section has been dropped.

-------

Commenting on:

> that is the only one for which the URI authority has taken specific 
> responsibility.

Stuart's comment:

> Hmmm… I might argue that the
> same assignment authority is
> equally *responsible* for both
> URIs, however they have set no
> particular expectation wrt to the
> second URI (at least in the vicinity
> of Chicago – though who knows
> what might happen to be painted
> on the side of busses in Boston).

Good point.  He's responsible for the URI and the resource, he just hasn't 
claimed that it has anything to do with the weather. 

Proposed resolution:

I've reworded that to:  "Bob has seen an advertisement listing just the 
Chicago URI, and that is the only one that the URI authority has
warranted will be a useful weather report."

-------

Commenting on:

> Good Practice: Guess information from URIs only when the 
> consequences of an incorrect guess are acceptable.

Stuart suggests:

> Alternative formulation: “When guessing information from URIs be 
> robust to unexpected results.”

Honestly, I don't like mine, but I'm afraid I don't like yours much 
either.  This part of the finding has always suffered from a certain 
circularity or obviousness, and I haven't found a great way to get to the 
essence which is:  "Guessing has its downsides, but on balance it's 
something people will do and often have good reasons for doing.  Watch out 
for the obvious pitfalls."  Doesn't have quite the gravitas I'd expect in 
a TAG finding, but I'll give it a little more thought.  Some chance the 
original will survive, in part because I haven't come up with better, in 
part because it was approved by the TAG, and in part because I think it's 
time to ship this and while the above isn't quite up to my standards, it's 
not telling anyone to do anything dangerous.

-------

Commenting on:

> Bob could, with this assurance, write his own software to construct and
> use such URIs to retrieve weather reports.

Stuart writes:

> Ok… but
> Bob’s software is also vulnerable
> to change *if* example.org change
> the way that they organise their
> URI space (modulo or not “Cool
> URIs…”). I think that this risks
> overstating the assurance that Bob
> has.

Well, he could just as well hang onto the form for a week, a month or a 
year, fill it out, and hit the same problem.  You're right that given the 
way browsers work, there's a social expectation that forms are filled in 
promptly, but Cool URIs Don't Change, and I think that applies to the ones 
with query strings too.  Anyway, ole Bob knows the nature of the 
documentation he got (an HTML form), and if he's smart enough to reverse 
engineer it to get the URI assignment policy, I bet he's smart enough to 
make a guess as to whether the form is time sensitive.

-------

Commenting on:

> Assignment authorities may publish specifications detailing the 
> structure and semantics of the URIs they assign. Other users of 
> those URIs may use such specifications to infer information about 
> resources identified by URI assigned by that authority.

Stuart writes:

> Comment [skw10]:
> I think that the generation of
> unique identifiers is the more
> likely reason for embedding socalled
> metadata in a URI. I suspect
> that in general it is rarely the intent
> that the URI be parsed to extract
> what some construe as embedded
> ‘metadata’.
> I think the uniqueness driver
> should be introduced earlier, where
> sufficient static distinguishing
> characteristics are encoded into a
> URI in order to make it unique.

I suppose I'm less convinced than you that we need to get into the 
motivations of the assignment authorities, but even if we did, I don't 
share your assumptions.  Usually when I see a URI like:  
http://www.cnn.com/2006/WORLD/meast/08/14/carroll/index.html, which 
happens to be an actual news report URI from CNN a few weeks ago, I don't 
think they are just going for uniqueness.  GUIDs would be far easier. 
While they've presumably chosen the assignment for their own reasons, it's 
a good guess as to what metadata they're encoding here, and I can think of 
lots of reasons other than uniqueness that they would have done so.  The 
very existence and widespread use of .htaccess files in Apache suggests 
that metadata is encoded in URIs for reasons other than uniqueness.  That 
being the case, I think it's appropriate that this finding assumes that 
such metadata will often be there, or appear to be there, and that it 
focusses mostly on when to encode it, and whether to trust it.

-----------------------------

The draft finding says:

> Assignment authorities may publish specifications detailing the
> structure and semantics of the URIs they assign. Other users of
> those URIs may use such specifications to infer information 
> about resources identified by URI assigned by that authority.

Stuart writes:

> I think that given that such specifications may be subject to 
> change, there should be some caution suggested wrt the 
> permanence of any implied commitment on the part of the 
> assignment authority.

As I noted earlier, Cool URIs don't change.   As far as I'm concerned, the 
instant the assignment authority publishes the bindings for a family of 
URIs, good practice is that the associations for those URIs be set 
forever.   On the contrary, rather than warning of impermanence, I'd be 
tempted to warn assignment authorities that such documentation does, per 
Cool URIs, represent a perpetual commitment at least in principle.  As I 
mentioned at the start of this note, I'm setting the bar pretty high on 
making changes at this late point, as they are likely to generate more 
debate and more delays.  Since I don't think the draft is "broken" I 
propose to leave it.  Were I convinced to change it after all, my starting 
position would be to add the warning to assignment authorities that the 
commitment is perpetual.  Can you live with this resolution?

Thank you for the care with which you reviewed the latest drafts, and for 
your patience in waiting for this response.  Please review the new draft, 
and let me know whether you are comfortable with the resolutions contained 
therein.   I expect this will be published as a TAG Finding shortly. Thank 
you!

Noah

[1] http://www.w3.org/2001/tag/2006/06/14-minutes.html#item01

[2] http://lists.w3.org/Archives/Public/www-tag/2006Aug/0069.html

[3] http://lists.w3.org/Archives/Public/www-tag/2006Jul/0026.html

[4] 
http://lists.w3.org/Archives/Public/www-archive/2006Jul/att-0009/metaDataInURI-31-skw-ann.pdf


--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Monday, 18 September 2006 14:34:48 UTC