- From: Schleiff, Marty <marty.schleiff@boeing.com>
- Date: Mon, 18 Sep 2006 09:59:00 -0700
- To: <noah_mendelsohn@us.ibm.com>, "Stuart Williams" <skw@hp.com>
- Cc: <www-tag@w3.org>
Hi All, I found Metadata in URI pretty interesting. While it seems pretty thorough in exploring metadata about a URI's intended resource, it doesn't seem to address metadata about the URI itself. Helpful metadata about the URI (not about the resource) might include things like claims of persistence vs. one time pseudonym, normalization & matching rules, ordering rules (e.g., for versions), on click behavior, resolvable or not, protocol to use, etc. I think URI schemes do a pretty good job of letting an application know how to process a URI (http processing is different than https; ldap processing is different than ftp; etc.). As the TAG is encouraging use of just a single scheme (i.e., http) for all identifiers, it seems the TAG should also provide direction on how to convey processing instructions to relying applications. The only suggestion I have seen, which I think came from a TAG member, and which I think is not very well thought out, was something like the following: Don't do this: <newScheme>://<stuff> Do do this: http://<newSchemeOrganization>.org/<stuff> Then a clever application, upon recognition of "<newSchemeOrganization>.org", might know to interpret the URI as specified by NewSchemeOrganization. While I think this idea might be a start, it doesn't go far enough. It doesn't answer the following: 1) It relies too much on tribal knowledge. How's an application supposed to know that "<newSchemeOrganization>.org" is intended to convey scheme-type information, while "<otherOrganization>.org" does not convey scheme-type information? I think this could be resolved by introducing a new DNS top level domain (maybe something like ".scheme" or ".spec") specifically for the purpose of unambiguously indicating that the URI has particular characteristics and meaning (e.g., http://<newSchemeOrganization>.spec/<stuff>). 2) What should be returned by "http://<newSchemeOrganization>.spec"? Hopefully it would be a specification describing the special characteristics of the URIs under this authority. 3) What if several companies collaborate on a new specification, but there's no organization representing the collaboration of companies, then what should go into the authority section? 4) TAG members frequently justify the use of a single scheme by claiming it's expensive to introduce new schemes, and difficult for applications to be taught how to process new schemes. I claim it would be just as expensive and difficult to teach applications how to recognize the various URI characteristics and semantics with a "<newSchemeOrganization>.spec" approach, and more expensive with a nebulous "<newSchemeOrganization>.org" approach, and even more expensive with no approach at all. Marty.Schleiff@boeing.com; CISSP Associate Technical Fellow - Cyber Identity Specialist Computing Security Infrastructure (206) 679-5933 -----Original Message----- From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] Sent: Monday, September 18, 2006 7:34 AM To: Stuart Williams Cc: www-tag@w3.org Subject: Proposed disposition of Stuart Williams' comments on Metadata in URI 31 At its June 2006 F2F meeting in Amherst, MA, the TAG voted "to accept http://www.w3.org/2001/tag/doc/metaDataInURI-31-20060609 contingent on Noah finishing his TODO list to the satisfaction of Ed." [1] Before I could wrap up the finding, I received two sets of additional comments, and I informed the TAG that I would delay publication until I had reviewed and suggested dispositions for that new input. In August, I summarized my proposed responses to the comments received from Bjoern Hoehrmann [2]. The purpose of the this note is to describe the proposed dispositions for the comments received from Stuart Williams [3,4]. As in my responses to Bjoern, I'm trying to strike a balance. On the one hand, I want to be responsive where there are important concerns. On the other, we always have to pick a point here the TAG will say "publish", and further comments can be considered as input to possible revisions. So, I've tried to respond to Stuart's comments with some detail and care, but given their late arrival I am setting the bar a little higher than I might normally in being open to significant redraft of the findings. I hope the following strikes a reasonable balance. The following quotes are from Stuart's comments (the notes in [4]) followed by my comments. In a moment I will send another note announcing the posting of a new draft of the finding. Any changes proposed below are in that draft (which in fact is already on the Web at the usual URIs.) Stuart's comment: > The concept of authority wrt to URI is one which some have pushed back > against. They have argued that the URI scheme itself is what states > what a given URI identifies. Generally this is presented as an > operationalised notion of what it means to 'identify' a resource. This > view would likely also argue that RFC2616 'creates' all possible HTTP > URIs. The term is used at least occasionally in the WWW Architecture document, and in some sense it's locally defined in the metaDataInURI-31 draft: "The authority that creates a URI is responsible for assuring that it is associated with the intended resource, " I don't recall any other TAG members raising this in earlier reviews. If other members of the TAG think it's worth the effort to come up with a different approach, I'd be willing, but my vote is to leave this as is. I do see your point, but I'm just not convinced there's a problem. ------- Commenting on: > "Many URI schemes offer a flexible structure that can also be used to > carry additional information, called metadata, about the resource." Stuart's comment: > Do you have an example of such a scheme. > I can't think of any!!! Sure, the http scheme for example. I can encode into URIs in that scheme creation dates, directory hierarchies, file types, and all sorts of things. It doesn't provide a standard representation for any one of those, but that's not the point: it's a schema that "can be used" to carry such information. Indeed, the subject of the finding is when it should be used in that way, and when consumers of URIs should depend on it having been used that way. ------- Commenting on: > "The first question is focused on people and software acting in the > role of or on behalf of a URI assignment authority (authorities) for > URI assignments within the scope of that authority. The other > questions are focused on people and software making use of URIs > assigned outside of their own authority (observers). Stuart's comment: > Whilst I'm conscious that this is either text that I wrote or similar, > it is again couched in terms of authority, which I know some rejects. > That said I think that there may be a crossing of layers here in that > an operationalised view of what a given URI identifies has nothing to > say about what a resource signifies. As I said above, I'm OK with speaking of authorities. On your 2nd point, I don't see the finding text speaking in those terms, but even if it did, I think there is a connection, insofar as the definition of a URI schema creates the (means by which an authority expresses an) association between any particular URI and a resource. If the "operational" results aren't consistent with or reflective of that, then I would say the system is misconfigured. For example, if I have in hand an https URI, and I dereference it over a network that someone has misconfigured to ignore all the integrity guarantees implied for the association of an https URI with its resource, I may operationally get a result that is not really for the resource. That's true, but it's because the system is not configured in a manner that reflects the requirements of the URI scheme that it's supporting. When things work right, I think the operational results are reflective of the underlying resource, at least in whatever sense that the URI scheme establishes such an association. I do have some concern with that paragraph, but mainly editorially. I think it's a but clunky, but it seemed to have been in the finding since before I was involved, and since it was saying things with which I basically agree, I left it. Proposed resolution: To deal with the clunkiness, I have reworded as: "The first question is primarily of concern to URI assignment authorities, who must choose a suitable URI for each resource that they control. The other questions are focused on people and software making use of URIs, whether at the resource authority or elsewhere. Of course the questions are related, insofar is one reason for an authority to encode metadata is for the benefit of resource users." Stuart's comment: > FWIW IIRC Roy on the other hand supported the notion of delegated > authority passed on downward from the URI spec to scheme specs, to > 'owners' of DNS names and so forth. I'm comfortable with Roy's position. ------- Commenting on: > In this example, there is no normative specification that provides for > determination of a media-type from URI suffixes, and the assignment authority > has provided no documentation to license an inference of media-type > from the > URI. Martin's browser is in error, because it relies on URI metadata > that is not covered by normative specifications and has not been > documented by the assignment authority. A correctly written browser > would have shown the faulty > XML as text, or might conceivably have shown a warning about the apparent > mismatch between the type inferred from the URI and the returned Content- > Type. (Martin's browser is also ignoring TAG finding "Authoritative Metadata" > [AUTHMETA], which mandates that the Content-Type HTTP header takes > precedence even if type information had somehow been reliably encoded > in the URI.) Stuart's comment: > Comment [skw4]: It is in error because it construes that there is > metadata intentionally placed in the URI when there is not. Hmm. You seem to be saying that we know conclusively that there is no metadata in that URI, and I don't think that's the case. In fact, there may well have been metadata, even in the .xml suffix in question. The authority may have decided to use .xml as a suffix for anything that was originally intended as xml, and in this case has extended that convention to some buggy XML that is in fact not well formed. I think the draft on the finding is correct as it stands: there may or may not be metadata in the URI, but the point is we can't know whether it's there or how to interpret it unless there are normative specifications or documentation from the assignment authority. I'm afraid I'm not convinced on this one. ------- Stuart's comment: > typo: reaons -> reasons Fixed, thank you! ------- Commenting on: > There is certain metadata that Martin or his browser can reliably determine > from the URI. For example, the URI conveys that the http scheme has > been used, and that attempts to access the resource should be directed > to the IP > address returned from the DNS resolution of the string "example.org". These > conclusions are licensed by normative specifications such as [URI] and > [HTTP]. Stuart's comment: > Comment [skw5]: Hmmmm I > have always found this tricky. Wrt > to say FTP URI scheme, the > scheme tells you (in an operational > style) what resource is identified - > it is the resource that would > provide the resulting > representation *if* you did a > particular bunch of things. The > HTTP spec is the same. However, > neither is a statement about HOW > the resource should be accessed, > only a statement of WHAT > resource is identified. Ok. Yes, > typically HTTP: would imply that > access using http ought to be > possible. I've found it tricky too, witness my so far unsuccessful attempts to tell just this story in the drafts on schemeProtocols. The question here is: does the paragraph as quoted above need fixing? I certainly think it's right that "the http scheme has been used", as that's covered by normative specs. I'm a little less clear on whether I've quite correctly told the story in saying "that attempts to access the resource should be directed to the IP address returned from the DNS resolution of the string "example.org". These conclusions are licensed by normative specifications such as [URI] and [HTTP]." I'll ask other TAG members for their opinions, though I really don't want to back into the whole schemeProtocols discussion. If necessary, I'll delete the offending parts of that paragraph. Unless other TAG members agree there's a problem, I propose to leave it. ------- Commenting on: > Good Practice: Avoid software dependencies on metadata in URIs. Stuart's comment: > Comment [skw6]: The tone of > this seems to me to have a > presumption that metadata *is* > embedded in URIs, as opposed to > "in some cases there happens to be > metadata embedded in URIs". The section in which this suggestion was made has been dropped. > I find myself not wanting to allow > that the things being cited here as > metadata are infact metadata. I see > them mostly as 'distinguishing' > characteristics which have been > encoded into URIs That seems like metadata to me, except in the case where the information in the URI happens to duplicate what's in the content, in which case it's arguably "data" not "metadata". > principally for > the purpose of generating unique, > transcribable URIs, rather than > with the intent that metadata be > recoverable from the URI. I'm not convinced that the motivations of the authority are what's important. It's often there. When it is, or when it appears to be there (see sections on guessing), it's tempting for clients to rely on it. This GPN is saying: especially in software, don't do that. Anyway, as noted above, the section has been dropped. ------- Commenting on: > that is the only one for which the URI authority has taken specific > responsibility. Stuart's comment: > Hmmm... I might argue that the > same assignment authority is > equally *responsible* for both > URIs, however they have set no > particular expectation wrt to the > second URI (at least in the vicinity > of Chicago - though who knows > what might happen to be painted > on the side of busses in Boston). Good point. He's responsible for the URI and the resource, he just hasn't claimed that it has anything to do with the weather. Proposed resolution: I've reworded that to: "Bob has seen an advertisement listing just the Chicago URI, and that is the only one that the URI authority has warranted will be a useful weather report." ------- Commenting on: > Good Practice: Guess information from URIs only when the consequences > of an incorrect guess are acceptable. Stuart suggests: > Alternative formulation: "When guessing information from URIs be > robust to unexpected results." Honestly, I don't like mine, but I'm afraid I don't like yours much either. This part of the finding has always suffered from a certain circularity or obviousness, and I haven't found a great way to get to the essence which is: "Guessing has its downsides, but on balance it's something people will do and often have good reasons for doing. Watch out for the obvious pitfalls." Doesn't have quite the gravitas I'd expect in a TAG finding, but I'll give it a little more thought. Some chance the original will survive, in part because I haven't come up with better, in part because it was approved by the TAG, and in part because I think it's time to ship this and while the above isn't quite up to my standards, it's not telling anyone to do anything dangerous. ------- Commenting on: > Bob could, with this assurance, write his own software to construct > and use such URIs to retrieve weather reports. Stuart writes: > Ok... but > Bob's software is also vulnerable > to change *if* example.org change > the way that they organise their > URI space (modulo or not "Cool > URIs..."). I think that this risks > overstating the assurance that Bob > has. Well, he could just as well hang onto the form for a week, a month or a year, fill it out, and hit the same problem. You're right that given the way browsers work, there's a social expectation that forms are filled in promptly, but Cool URIs Don't Change, and I think that applies to the ones with query strings too. Anyway, ole Bob knows the nature of the documentation he got (an HTML form), and if he's smart enough to reverse engineer it to get the URI assignment policy, I bet he's smart enough to make a guess as to whether the form is time sensitive. ------- Commenting on: > Assignment authorities may publish specifications detailing the > structure and semantics of the URIs they assign. Other users of those > URIs may use such specifications to infer information about resources > identified by URI assigned by that authority. Stuart writes: > Comment [skw10]: > I think that the generation of > unique identifiers is the more > likely reason for embedding socalled > metadata in a URI. I suspect > that in general it is rarely the intent that the URI be parsed to > extract what some construe as embedded 'metadata'. > I think the uniqueness driver > should be introduced earlier, where > sufficient static distinguishing > characteristics are encoded into a > URI in order to make it unique. I suppose I'm less convinced than you that we need to get into the motivations of the assignment authorities, but even if we did, I don't share your assumptions. Usually when I see a URI like: http://www.cnn.com/2006/WORLD/meast/08/14/carroll/index.html, which happens to be an actual news report URI from CNN a few weeks ago, I don't think they are just going for uniqueness. GUIDs would be far easier. While they've presumably chosen the assignment for their own reasons, it's a good guess as to what metadata they're encoding here, and I can think of lots of reasons other than uniqueness that they would have done so. The very existence and widespread use of .htaccess files in Apache suggests that metadata is encoded in URIs for reasons other than uniqueness. That being the case, I think it's appropriate that this finding assumes that such metadata will often be there, or appear to be there, and that it focusses mostly on when to encode it, and whether to trust it. ----------------------------- The draft finding says: > Assignment authorities may publish specifications detailing the > structure and semantics of the URIs they assign. Other users of those > URIs may use such specifications to infer information about resources > identified by URI assigned by that authority. Stuart writes: > I think that given that such specifications may be subject to change, > there should be some caution suggested wrt the permanence of any > implied commitment on the part of the assignment authority. As I noted earlier, Cool URIs don't change. As far as I'm concerned, the instant the assignment authority publishes the bindings for a family of URIs, good practice is that the associations for those URIs be set forever. On the contrary, rather than warning of impermanence, I'd be tempted to warn assignment authorities that such documentation does, per Cool URIs, represent a perpetual commitment at least in principle. As I mentioned at the start of this note, I'm setting the bar pretty high on making changes at this late point, as they are likely to generate more debate and more delays. Since I don't think the draft is "broken" I propose to leave it. Were I convinced to change it after all, my starting position would be to add the warning to assignment authorities that the commitment is perpetual. Can you live with this resolution? Thank you for the care with which you reviewed the latest drafts, and for your patience in waiting for this response. Please review the new draft, and let me know whether you are comfortable with the resolutions contained therein. I expect this will be published as a TAG Finding shortly. Thank you! Noah [1] http://www.w3.org/2001/tag/2006/06/14-minutes.html#item01 [2] http://lists.w3.org/Archives/Public/www-tag/2006Aug/0069.html [3] http://lists.w3.org/Archives/Public/www-tag/2006Jul/0026.html [4] http://lists.w3.org/Archives/Public/www-archive/2006Jul/att-0009/metaDat aInURI-31-skw-ann.pdf -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Monday, 18 September 2006 16:59:57 UTC