- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 16 Jan 2003 13:21:57 -0500
- To: "Mark Nottingham" <mnot@mnot.net>
- Cc: fielding@apache.org, sandro@w3.org, www-tag@w3.org
Mark Nottingham writes: > > > Do answers to these follow from your proposal? > > > hopefully - let's see ;) > > > > can/should a client keep histories based on substructure of URIs? > > I think this is actually a UI feature, not a URI > manipulation; they're treating them as a pool of opaque > (aha!) strings, sorted by alpha and length, and > returning those that match as you type in a string. Not sure I agree. Given previous references to: http://example.com/link1 http://w3.org/linkx http://example.com/link2 http://w3.org/linky Most history lists say things like: Example.com Title of link1 ` Title of link2 w3.org Title of linkx Title of linky That's much more than a sort on an opaque string. It depends on knowing that the DNS name is a distinguished part of an HTTP URI. Similarly the type-ahead in IE knows about the / separator and fills in things one token at a time. Again, in some sense not opaque. OK per web architecture or not? After all, real opacity would mean that it shouldn't even look at the substructure, except maybe when actually initiating an operation such as GET. As best I can tell, IE checks for the http: scheme and "special-cases" it in its type-ahead. OK or not? > > Is it OK for cache proxies to microparse URIs to infer > > clustering characteristics of the information space? > > That is, use a URI as input to a freshness heuristic? No, I meant as a locality heuristic. For example, my cache will retain only content with a URI that matches, e.g. http//example.com/* In other words, my cache will retain representations only of resources appearing to originate from example.com. Appropriate use of URI or not? How about: http//example.com/x/* a subresource of the hierarchical (in RFC 2396 terms) resource at example.com? > It's allowed by HTTP, and sometimes used, but my > experience is that it's poor practice, and repeatedly > recommend against it (But I have a general bias against > heuristics in these situations). The caching industry > has generally migrated away from these solutions, > especially those surrounding prefetching (although > there is a certain fascination with it in academic > circles). > > From a URI perspective, I think it's not OK (and falls > under my SHOULD NOT). Does that answer apply to my clarification? > > Is it (or more correctly "why is it") OK for a client > > to actually inspect the scheme to determine a > > retrieval strategy? > > Yes, because that's part of the generic dereferencing > process, which is part of my first paragraph (although > it should probably be said a bit more clearly that > dereferencing is a special operation). > > > > Surely it is appropriate for the server to map the HTTP example above to > > file system sub-directories should it choose to do so? (Though of > course, > > that's not required or visible from the outside.) > > Yes, because it is the authority that minted the URI > that's doing the mapping. OK. On both use of scheme to decide a dereference strategy, and use of URI substructure at the host supporting the resource, I think we need a crisp statement that says what's going on. I'm specifically curious about the use of an http URI for a resource that is not in fact hosted by an http-based server. I suspect the answer is along the lines of: "A client or other agent MAY determine the schema from a URI and MAY at any time for for any reason attempt operations defined by the scheme. For example, a client MAY attempt a GET or POST on a URI using the http: or https: scheme. If such operations succeed, then the client MUST assume that any retrieved representations or other results were indeed a result of successful access to the named resource (or redirections of it.) Accordingly, it is the responsibility of any person or software assigning a URI name to a resource to ensure that operations performed using the scheme used will either successfully access the correct resource, or will fail. For example, in using the URI mailto:noah_mendesohn@us.ibm.com to identify my mail drop, it's not enough for me to ensure that my employer controls the us.ibm.com domain, and that the URI is otherwise unused. I must ensure that if someone actually sends mail using this URI, that it will indeed go to the intended resource (my mail drop), or that the mail will bounce. If the mail might go to someone else, we've got a problem." I may not have the rule exactly right, but it's exactly the sort of thing I would expect to see spelled out at this level of detail, probably in the arch document. Thanks! ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------
Received on Thursday, 16 January 2003 13:23:14 UTC