- From: <noah_mendelsohn@us.ibm.com>
- Date: Tue, 30 May 2006 13:20:10 -0400
- To: www-tag@w3.org
- Message-ID: <OFDEFC36E3.98F531B2-ON8525717E.005EC618-8525717E.005F3B07@lotus.com>
With Raman's permission, I am forwarding this to the public TAG list. The original of his comments is included as an attachment to this file. That same text is mostly quoted below, along with my responses. I expect these will be discussed on the TAG call shortly. -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- ----- Forwarded by Noah Mendelsohn/Cambridge/IBM on 05/30/2006 01:15 PM ----- Noah Mendelsohn 05/30/2006 11:51 AM To: "T.V Raman" <raman@google.com> cc: ed.rice@hp.com, raman@google.com, tag@w3.org, TimBL@w3.org, Vincent.Quint@inrialpes.fr Subject: Re: Review of "The use of Metadata in URIs" Raman: Thank you so much for this detailed review. Some initial comments are given below. By the way, I'm responding on tag@w3.org since that's where you posted your comments, but I think this discussion would be better held on the public list. Raman, any problem with my forwarding this note, along with your comments, to www-tag? Raman wrote: > ** Introduction > > In the following sentence, we probably dont need _thus_. > > The authority that creates a URI is responsible for assuring > that it is associated with the intended resource, and thus that > the appropriate data is manipulated or returned in response to > operations that use the URI as a resource designator. Hmm. I thought that this text came from earlier drafts, but that seems not to be the case. Anyway, I find it somewhat helpful in setting the general framework of the discussion, I.e. reminding readers of the main responsibilities of those who have authority over URIs. That said, I agree that it's peripheral to the finding itself. If the TAG consensus is to drop it or change it, that's OK with me. [In a later response, Raman made clear that his suggestion was merely to drop the word "thus". NRM] > I'd also consider rewriting it as follows to make it easier on > the reader: > > The authority that creates a URI is responsible for assuring that: > - The URI is correctly associated with the intended resource, > - The identified resource is manipulated or returned in > response to > operations using the URI. I'm less comfortable with this suggestion. In text of this sort, I find the bullet form to be a bit jarring. If we these were really key points of the finding, I'd be more inclined to give the list the added editorial weight that you suggest. I confess I'm not convinced in this case. I can see dropping the text entirely, or replacing it with some other short reminder of what assignment authorities do, but I think that breaking out the bullets unduly disrupts the flow in this case. > The next two sentences imply that there is some metadata scheme > designed into the structure of URIs; and this might actually lie > at the root cause of some of our back and forth discussion on > this topic. Let's see. Those next two sentences say: "Many URI schemes offer a flexible structure that can also be used to carry additional information, called metadata, about the resource. Such metadata might include the title of a document, the creation date of the resource, the MIME media type that is likely to be returned by an HTTP GET, a digital signature usable to verify the integrity or authorship of the resource content, or hints about URI assignment policies that would allow one to guess the URIs for related resources." Hmm. When I selected the phrase "flexible structure that can also be used", I intended it to suggest that encoding metadata was not it's primary purpose (i.e. except in the limited ways that RFC 3986 is designed to convey scheme, hierarchy, etc.) Obviously, you read it differently than I intended. I'll be curious to hear whether other TAG members also had that misunderstanding, in which case I should try again. I confess that saying that something "can also be used" for a purpose doesn't seem to me to be saying that it's "designed" for that purpose. > I think the truth is that the structure of URIs in themselves did > not design in metadata schemes; Agree. > however, URIs (and HTTP URIs in > particular) were human-readable, and consequently, people have > cleverly encoded metadata into them; the TAG question: how can > this be leveraged, and when can it be relied upon? I don't think it's just that they're human readable. It's that, as the draft suggests, their format is quite flexible and that flexibility facilitates embedding of substructure, whether for human or machine consumers (e.g. HTML forms processors). > So I'd suggest reworking > the following extract: > > Many URI schemes offer a flexible structure that can also be used > to carry additional information, called metadata, about the > resource. Such metadata might include the title of a document, > the creation date of the resource, the MIME media type that is > likely to be returned by an HTTP GET, a digital signature usable > to verify the integrity or authorship of the resource content, or > hints about URI assignment policies that would allow one to guess > the URIs for related resources. > > Rewrite as: > > URIs are flexible in their structure and are often human-readable. > This structure has been exploited to carry implicit metadata such as: > - Document Title > - Date > - Mime type > - a digital signature usable to verify content integrity > - hints about URI assignment policies that enable guessing > related URIs. > > As an example, > _http://example.com/2006/web/introduction/chapter-01.html_ hints > at the document title, the date it was created, the overall > position of this document within the larger document, and its > content-type. I confess that I find the original to flow better, to be shorter, and to be more readable. I suppose I can live with the mention of human readability, but I think you are putting more emphasis on the difference between human and software users of the Web than I would. (see below) Again, if other TAG members prefer the bullet form, I can live with it, but I prefer the flow and feel of the original, perhaps with minor edits if we want to include the human readability point. > Simplify the following somewhat bureaucratic sentence? > > The first question is focused on people and software acting > in the role of or > on behalf of a URI assignment authority (authorities) for > URI assignments > within the scope of that authority. > > Here, (1) is focused on entities acting on behalf of a URI > assignment authority when creating URIs within the scope of that > authority. This traces to versions of the finding [1] that predate my work on it. I find it somewhat helpful, but could be easily convinced to remove or reword it. > **Encoding And Using Metadata In URIs > > *** Reword Constraint in 2.1 > > I like this example. > However I'd request a minor re-wording of the identified > constraint: > > Constraint: Users of the Web and Web software MUST NOT attempt > to draw unverifiable conclusions about a resource or its > representations by inspection of its URI, except as licensed by > relevant normative specifications or by URI assignment policies > published by the relevant URI assignment authority. > > Suggestion: Could we limit the above constraint to software, > and not have it extend to users (where users mean human users)? > Put differently, I would like Martin's software to be sent to TAG > jail, but Martin himself should not b punished if he said "that > is XML" by looking at the URL. > Let's face it, Martin as a typical intelligent human can: > > - Guess the content is XMl from the extension, > - Even guess that it is broken XML, > - - Probably did not get to see the HTTP headers, > - And would never take the time to go ask _the relevant > authority_ if he is correct, > - And given a longer XML document, would not know *how* it was > broken without software assistance. I really don't think I want to draw such sharp distinctions between people and software. In fact, human Martin >should< be sent to TAG jail if he picks up the phone and tells you "Hey, I found an XML version of that document" just by looking at the URI. He's crossed the line by telling you this non-fact in a tone that suggest he's not accounting for the possibility that he's guessed wrong. You might go on to trust that erroneous information. Similarly, as you say below, a human who has good reason for guessing may indeed write software that helps him act on those guesses, but both the code and the human have to be prepared for the possibility that the guesses are wrong. I think the draft finding has it about right on all of that. I'm just not convinced that telling a detailed story about the differences between people and software is on the mark in this case. *** Best Practice in 2.2 > I agree with the conclusion of this section, but dont agree > entirely with its tone. > As best practice, could we write it such that authorities > assigning URIs are encouraged to do the reasonable thing, as > opposed to putting the entire blame for the failure on Bob? > In the long run, Darwin will take care of people who create bogus > URIs that do not meet the end-user's expectations. Good point. The text currently says: "Still, the ability to explore the Web informally and experimentally is very valuable, and Web users act on guesses about URIs all the time. Many authorities facilitate such flexible use of the Web by assigning URIs in an orderly and predictable manner. Nonetheless, in the example above, Bob is responsible for determining whether the information returned is indeed what he needs." That sentence about authorities facilitating use through orderly assignment was intended to signal that it's good practice, but we could indeed say that a bit more strongly. I'll try and come up with rewording for the next draft. > *** Possible Erroneous Assumption In 2.3 > > Assumes that the HTML Form is _authoritative_ > --- note that this is true *if and only if* the HTML form was > authored by the authority assigning the URI --- and that on the > Web today, this is not always the norm. Indeed. That important point was raised by Dan in [2], where he wrote: "You might note that the action= attribute allows a form to point anywhere in the web, so in fact, HTML forms allows anyone, not just an authority, to make claims about the URI structure of http://example.org/cityweather ." My suggestion in response was [3]: "Well, the subtlety seems to me that the claims are authoritative (in the sense the finding discusses) only if the authority sourcing the form and the authority for the ACTION URI are the same. It's cool that I can serve up lots of Web forms with ACTIONs pointing to danconnolly.com, but I expect you shouldn't be held responsible for either the implied structure of your URIs, or for anything I might say about them in the natural language text of the form. I'm thinking it might be worth a sentence in the finding to give that warning, I.e. that 3rd party claims in forms have the same standing or lack thereof as 3rd party claims in books, ordinary web pages, or on the sides of busses: trust claims that (appear to be) made by someone other than the resource authority only at your own risk." Does that seem like the right way to handle it? > *** One URI Space Please In 2.5 > > Implies that there might be two *URI spaces* one for writing on > backs of buses and another for writing inside HTML hyperlinks. > I would personally consider that *extremely* bad practice. I don't see anything there that suggests disjoint URI spaces. It says: "URIs optimized for use by the assignment authority may sometimes be inconvenient for resource users. " I think that's manifestly true, and we encounter the consequences on the Web quite regularly. It then concludes: "Good Practice: URIs intended for direct use by people should be easy to understand, and should be suggestive of the resource actually named." That's presumably what suggested two spaces to you, but it seems appropriate to me. I believe that it's clearly true, and to be encouraged, that of all the URI's in the world, a subset is indeed optimized for convenient direct memorization, typing or manipulation by human users (e.g. http://www.w3.org) and others that are optimized more for the benefit of the resource owner: ( http://www.w3.org/2002/09/wbs/36693/xmlvaria200604/). I don't think that the draft inappropriately suggests that the distinction is crisp, or that there are two disjoint spaces. The good practice note suggests, correctly in my opinion, that if you are assigning a URI, and if it's a goal that the URI be directly used by people, that it should be both easy to understand (maybe remember?) and suggestive of the resource named. > * Conclusion > > Overall I like the document. > I'd throughout emphasize that: > > - Human users should be encouraged to guess. > - Software should rely on documented and verifiable metadata. As I've said above, I'm afraid I'm not convinced this is a good distinction. > - Humans creating bleeding edge software should not be > imprisoned for depending on guesses. Right, which is among the reasons I don't want to organize the story around the differences between people and software. It seems that you're actually saying that there's: (1) ordinary software, in which guessing is very bad; (2) people, whom you actually encourage to guess; and (3) software that's written specifically to help people who are guessing, and that software can guess after all. > - Such guesses should be documented, and where possible > communicated to the authority issuing the URI. Are you saying that Bob should call the Web site and say "Hey, I saw that URI on the side of the bus and I also guessed that you had some others?". In fact, why does Bob have to document anything to anyone? In the privacy of his own home he took a gamble, and as far as he can tell it's paid off. Why does he have to document anything or tell anyone anything? > - Sometimes, human-authored software that encapsulates guesses > made by the developer can prove a useful tool in discovering > additional means of using Web resources not originally > envisioned by the owner of the resource. I agree, though I'm not sure where you're going with this thought. Are you saying: therefore people should write such software? Therefore people should call up the resource owner and explain what they've discovered? What would you like the finding to say in this area? > - Where such *additional* use does not contravene the original > terms of use e.g., guessing the URL to someone else's bank > account, the Web architecture should encourage these, since > it leads to an overall democratization of available services > on the Web, with users being able to implicitly ask for the > _right_ API. Thank you for the very detailed comments. They are truly very helpful. I'm sorry that I haven't been more immediately agreeable on your main point, about the differences between people and software, but I'm just not yet convinced on that one. I think a lot of the others are editorial or matters of emphasis, and I'll be glad to try and capture whatever the group as a whole thinks is best based on your input. I'm also sorry that this response is coming right before the call, but I was mostly out of email contact last week. Noah [1] http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030708.html [2] http://lists.w3.org/Archives/Public/www-tag/2006May/0020.html [3] http://lists.w3.org/Archives/Public/www-tag/2006May/0028.html -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Attachments
- application/octet-stream attachment: comments-uri-metadata
Received on Tuesday, 30 May 2006 17:20:29 UTC