[metaDataInURI-31] Raman's Review of "The use of Metadata in URIs" from noah_mendelsohn@us.ibm.com on 2006-05-30 (www-tag@w3.org from May 2006)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 30 May 2006 13:20:10 -0400
To: www-tag@w3.org
Message-ID: <OFDEFC36E3.98F531B2-ON8525717E.005EC618-8525717E.005F3B07@lotus.com>
With Raman's permission, I am forwarding this to the public TAG list. The 
original of his comments is included as an attachment to this file.  That 
same text is mostly quoted below, along with my responses.  I expect these 
will be discussed on the TAG call shortly.

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------




----- Forwarded by Noah Mendelsohn/Cambridge/IBM on 05/30/2006 01:15 PM 
-----


Noah Mendelsohn
05/30/2006 11:51 AM

        To:     "T.V Raman" <raman@google.com>
        cc:     ed.rice@hp.com, raman@google.com, tag@w3.org, 
TimBL@w3.org, Vincent.Quint@inrialpes.fr
        Subject:        Re: Review of "The use of Metadata in URIs"



Raman: 

Thank you so much for this detailed review.  Some initial comments are 
given below.  By the way, I'm responding on tag@w3.org since that's where 
you posted your comments, but I think this discussion would be better held 
on the public list.  Raman, any problem with my forwarding this note, 
along with your comments, to www-tag? 

Raman wrote: 

> ** Introduction 
> 
> In the following sentence, we probably dont need _thus_. 
>     
>     The authority that creates a URI is responsible for assuring 
>     that it is associated with the intended resource, and thus that 
>     the appropriate data is manipulated or returned in response to 
>     operations that use the URI as a resource designator. 

Hmm.  I thought that this text came from earlier drafts, but that seems 
not to be the case.  Anyway, I find it somewhat helpful in setting the 
general framework of the discussion, I.e. reminding readers of the main 
responsibilities of those who have authority over URIs.  That said, I 
agree that it's peripheral to the finding itself.  If the TAG consensus is 
to drop it or change it, that's OK with me. 

[In a later response, Raman made clear that his suggestion was merely to 
drop the word "thus".  NRM]

> I'd also consider rewriting it as follows to make it easier on 
> the reader: 
> 
> The authority that creates a URI is responsible for assuring that: 
>   - The URI  is correctly associated with the intended resource, 
>   - The identified resource  is manipulated or returned in 
>     response to 
>     operations using the URI. 

I'm less comfortable with this suggestion.  In text of this sort, I find 
the bullet form to be a bit jarring.  If we these were really key points 
of the finding, I'd be more inclined to give the list the added editorial 
weight that you suggest.  I confess I'm not convinced in this case.  I can 
see dropping the text entirely, or replacing it with some other short 
reminder of what assignment authorities do, but I think that breaking out 
the bullets unduly disrupts the flow in this case.     

> The next two sentences imply that there is some metadata scheme 
> designed into the structure of URIs; and this might actually lie 
> at the root cause of some of our back and forth discussion on 
> this topic. 

Let's see.  Those next two sentences say: 

"Many URI schemes offer a flexible structure that can also be used to 
carry additional information, called metadata, about the resource. Such 
metadata might include the title of a document, the creation date of the 
resource, the MIME media type that is likely to be returned by an HTTP 
GET, a digital signature usable to verify the integrity or authorship of 
the resource content, or hints about URI assignment policies that would 
allow one to guess the URIs for related resources." 

Hmm.  When I selected the phrase "flexible structure that can also be 
used", I intended it to suggest that encoding metadata was not it's 
primary purpose (i.e. except in the limited ways that RFC 3986 is designed 
to convey scheme, hierarchy, etc.)  Obviously, you read it differently 
than I intended.  I'll be curious to hear whether other TAG members also 
had that misunderstanding, in which case I should try again.  I confess 
that saying that something "can also be used" for a purpose doesn't seem 
to me to be saying that it's "designed" for that purpose. 

> I think the truth is that the structure of URIs in themselves did 
> not design in metadata schemes; 

Agree. 

> however, URIs (and HTTP URIs in 
> particular) were human-readable, and consequently, people have 
> cleverly encoded metadata into them; the TAG question: how can 
> this be leveraged, and when can it be relied upon? 

I don't think it's just that they're human readable.  It's that, as the 
draft suggests, their format is quite flexible and that flexibility 
facilitates embedding of substructure, whether for human or machine 
consumers (e.g. HTML forms processors).   

> So I'd suggest reworking 
> the following extract: 
> 
>     Many URI schemes offer a flexible structure that can also be used 
>     to carry additional information, called metadata, about the 
>     resource. Such metadata might include the title of a document, 
>     the creation date of the resource, the MIME media type that is 
>     likely to be returned by an HTTP GET, a digital signature usable 
>     to verify the integrity or authorship of the resource content, or 
>     hints about URI assignment policies that would allow one to guess 
>     the URIs for related resources. 
> 
> Rewrite as: 
> 
> URIs are flexible in their structure and are often human-readable. 
> This structure has been exploited to carry implicit metadata such as: 
>   - Document Title 
>   - Date 
>   - Mime type 
>   - a digital signature usable to verify content integrity 
>   - hints about URI assignment policies that enable guessing 
>     related URIs. 
> 
> As an example, 
> _http://example.com/2006/web/introduction/chapter-01.html_ hints 
> at the document title, the date it was created, the overall 
> position of this document within the larger document, and its 
> content-type. 


I confess that I find the original to flow better, to be shorter, and to 
be more readable.  I suppose I can live with the mention of human 
readability, but I think you are putting more emphasis on the difference 
between human and software users of the Web than I would.  (see below) 
Again, if other TAG members prefer the bullet form, I can live with it, 
but I prefer the flow and feel of the original, perhaps with minor edits 
if we want to include the human readability point. 

> Simplify the following somewhat bureaucratic sentence? 
> 
>     The first question is focused on people and software acting
> in the role of or 
>     on behalf of a URI assignment authority (authorities) for 
> URI assignments 
>     within the scope of that authority. 
> 
> Here, (1) is focused on entities acting on behalf of a URI 
> assignment authority when creating URIs within the scope of that 
> authority. 

This traces to versions of the finding [1] that predate my work on it.  I 
find it somewhat helpful, but could be easily convinced to remove or 
reword it. 

> **Encoding And Using Metadata In URIs 
> 
> *** Reword Constraint in 2.1 
> 
> I like this example. 
> However I'd request a minor re-wording of the identified 
> constraint: 
> 
>   Constraint: Users of the Web and Web software MUST NOT attempt 
>   to draw unverifiable conclusions about a resource or its 
>   representations by inspection of its URI, except as licensed by 
>   relevant normative specifications or by URI assignment policies 
>   published by the relevant URI assignment authority. 
> 
> Suggestion: Could we limit the above constraint to software, 
> and not have it extend to users (where users mean human users)? 


> Put differently, I would like Martin's software to be sent to TAG 
> jail, but Martin himself should not b  punished if he said "that 
> is XML" by looking at the URL. 
> Let's face it, Martin as a typical  intelligent human can: 
> 
>   - Guess the content is XMl from the extension, 
>   - Even guess that it is broken XML, 
>   - - Probably did not get to see the HTTP headers, 
>   -  And would never take the time to go ask _the relevant 
> authority_ if he is correct, 
>   -  And given a longer XML document, would not know *how* it was 
>     broken without software assistance. 

I really don't think I want to draw such sharp distinctions between people 
and software.  In fact, human Martin >should< be sent to TAG jail if he 
picks up the phone and tells you "Hey, I found an XML version of that 
document" just by looking at the URI.  He's crossed the line by telling 
you this non-fact in a tone that suggest he's not accounting for the 
possibility that he's guessed wrong.  You might go on to trust that 
erroneous information.  Similarly, as you say below, a human who has good 
reason for guessing may indeed write software that helps him act on those 
guesses, but both the code and the human have to be prepared for the 
possibility that the guesses are wrong. 

I think the draft finding has it about right on all of that.  I'm just not 
convinced that telling a detailed story about the differences between 
people and software is on the mark in this case. 

*** Best Practice in 2.2 

> I agree with the conclusion of this section, but  dont agree 
> entirely with its tone. 
> As best practice, could we write it such that authorities 
> assigning URIs are encouraged to do the reasonable thing, as 
> opposed to putting the entire blame for the failure on Bob? 
> In the long run, Darwin will take care of people who create bogus 
> URIs that do not meet the end-user's expectations. 

Good point.  The text currently says: 

"Still, the ability to explore the Web informally and experimentally is 
very valuable, and Web users act on guesses about URIs all the time. Many 
authorities facilitate such flexible use of the Web by assigning URIs in 
an orderly and predictable manner. Nonetheless, in the example above, Bob 
is responsible for determining whether the information returned is indeed 
what he needs." 

That sentence about authorities facilitating use through orderly 
assignment was intended to signal that it's good practice, but we could 
indeed say that a bit more strongly.  I'll try and come up with rewording 
for the next draft. 

> *** Possible Erroneous Assumption In 2.3 
> 
> Assumes that the HTML Form is _authoritative_ 
> --- note that this is  true *if and only if* the HTML form was 
> authored by the authority assigning the URI --- and that on  the 
> Web today, this is not always the norm. 

Indeed.  That important point was raised by Dan in [2], where he wrote: 

"You might note that the action= attribute allows a form to point 
anywhere in the web, so in fact, HTML forms allows anyone, 
not just an authority, to make claims about the URI structure 
of http://example.org/cityweather ." 

My suggestion in response was [3]: 

"Well, the subtlety seems to me that the claims are authoritative (in the 
sense the finding discusses) only if the authority sourcing the form and 
the authority for the ACTION URI are the same.  It's cool that I can serve 

up lots of Web forms with ACTIONs pointing to danconnolly.com, but I 
expect you shouldn't be held responsible for either the implied structure 
of your URIs, or for anything I might say about them in the natural 
language text of the form.  I'm thinking it might be worth a sentence in 
the finding to give that warning, I.e. that 3rd party claims in forms have 

the same standing or lack thereof as 3rd party claims in books, ordinary 
web pages, or on the sides of busses: trust claims that (appear to be) 
made by someone other than the resource authority only at your own risk." 

Does that seem like the right way to handle it? 

> *** One URI Space Please In 2.5 
> 
> Implies that there might be two *URI spaces* one for writing on 
> backs of buses and another for writing inside HTML hyperlinks. 
> I would personally consider that *extremely* bad practice. 

I don't see anything there that suggests disjoint URI spaces.  It says: 

"URIs optimized for use by the assignment authority may sometimes be 
inconvenient for resource users. " 

I think that's manifestly true, and we encounter the consequences on the 
Web quite regularly. 

It then concludes: 

"Good Practice: URIs intended for direct use by people should be easy to 
understand, and should be suggestive of the resource actually named." 

That's presumably what suggested two spaces to you, but it seems 
appropriate to me.  I believe that it's clearly true, and to be 
encouraged, that of all the URI's in the world, a subset is indeed 
optimized for convenient direct memorization, typing or manipulation by 
human users (e.g. http://www.w3.org) and others that are optimized more 
for the benefit of the resource owner:  (
http://www.w3.org/2002/09/wbs/36693/xmlvaria200604/).   

I don't think that the draft inappropriately suggests that the distinction 
is crisp, or that there are two disjoint spaces.  The good practice note 
suggests, correctly in my opinion, that if you are assigning a URI, and if 
it's a goal that the URI be directly used by people, that it should be 
both easy to understand (maybe remember?) and suggestive of the resource 
named.   

> * Conclusion 
> 
> Overall I like the document. 
> I'd throughout emphasize that: 
> 
>   - Human users should be encouraged to guess. 
>   - Software should rely on documented and verifiable metadata. 

As I've said above, I'm afraid I'm not convinced this is a good 
distinction. 

>   - Humans creating bleeding edge software should not be 
>     imprisoned for depending on guesses. 

Right, which is among the reasons I don't want to organize the story 
around the differences between people and software.  It seems that you're 
actually saying that there's:  (1) ordinary software, in which guessing is 
very bad; (2) people, whom you actually encourage to guess; and (3) 
software that's written specifically to help people who are guessing, and 
that software can guess after all. 

>   - Such guesses should be documented, and where possible 
>     communicated to the authority issuing the URI. 

Are you saying that Bob should call the Web site and say "Hey, I saw that 
URI on the side of the bus and I also guessed that you had some others?". 
In fact, why does Bob have to document anything to anyone?  In the privacy 
of his own home he took a gamble, and as far as he can tell it's paid off. 
 Why does he have to document anything or tell anyone anything? 

>   - Sometimes, human-authored software that encapsulates guesses 
>     made by the developer can prove a useful tool in discovering 
>     additional means of using Web resources not originally 
>     envisioned by the owner of the resource. 

I agree, though I'm not sure where you're going with this thought.  Are 
you saying: therefore people should write such software?  Therefore people 
should call up the resource owner and explain what they've discovered? 
What would you like the finding to say in this area? 

>   -  Where such *additional* use does not contravene the original 
>     terms of use e.g., guessing the URL to someone else's bank 
>     account,  the Web architecture should encourage these, since 
>     it leads to an overall democratization of available services 
>     on the Web, with users being able to implicitly ask for the 
>     _right_ API. 

Thank you for the very detailed comments.  They are truly very helpful. 
I'm sorry that I haven't been more immediately agreeable on your main 
point, about the differences between people and software, but I'm just not 
yet convinced on that one.  I think a lot of the others are editorial or 
matters of emphasis, and I'll be glad to try and capture whatever the 
group as a whole thinks is best based on your input. 

I'm also sorry that this response is coming right before the call, but I 
was mostly out of email contact last week. 


Noah 

[1] http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030708.html 
[2] http://lists.w3.org/Archives/Public/www-tag/2006May/0020.html 
[3] http://lists.w3.org/Archives/Public/www-tag/2006May/0028.html 

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Attachments

application/octet-stream attachment: comments-uri-metadata
Received on Tuesday, 30 May 2006 17:20:29 UTC