Re: Question on the boundaries of content negotiation in the context of the Web of Data from ashok malhotra on 2009-02-18 (www-tag@w3.org from February 2009)

From: ashok malhotra <ashok.malhotra@oracle.com>
Date: Wed, 18 Feb 2009 12:40:26 -0800
To: Martin Nally <nally@us.ibm.com>
CC: www-tag@w3.org, www-tag-request@w3.org
Message-ID: <499C723A.2020709@oracle.com>
Hi Martin:
Good to hear from you!

I am interested in this thread from a somewhat different viewpoint which 
is as follows:
suppose you have a URI (which may or may not point to a document) but 
has associated with it additional information about the resource. ( I 
call this additional information, metadata)  How do you  find and  access
individual pieces of metadata?

Mark Nottingham and Eran Hammer-Lahav have published 3 IETF drafts on 
this subject.
Their methods, though, require 2 round-trips, one to get the URIs for 
all the metadata and the second to get at the specific metadata you 
want.  My thinking is that if you know the type of the metadata you 
want, you can use content negotiation to specify that and you can access 
the metadata in a single round trip.

When I suggested this some time ago, my hand was slapped and I was told 
that this was not a good use of CN.
Now, it seems that some folks at least are thinking of broadening the 
use of CN and that may sanctify your design and other similar usecases.

Are you at IBM RTP?  I ask because I will be there for a meeting March 
10-12.
This is the week after the TAG f2f and we may well have some interesting 
'progress' to discuss.

All the best, Ashok


Martin Nally wrote:
>
> Hi, Ashok,
>
> I apologize in advance for the length of this email - this is 
> especially rude since I'm new to this forum. You know me personally, 
> of course, (its been a long time, I hope you are well) and so does 
> Noah Mendelsohn, but for those who don't, I work in IBM as the CTO of 
> the Rational software brand which develops products and services to 
> support our customers' software development needs.
>
> By coincidence I have been corresponding with Noah privately on the 
> same (or a closely related) question. In our example, we have used 
> HTML and RDF as the content types, rather than images and RDF, This is 
> more than a detail of the example for us - the HTML in question is our 
> product's AJAX web UI implementation. Below is a summary of the 
> discussion between me and Noah. You won't find any brilliant insights 
> from me to answer the question, but you will find an explanation of 
> why this question seems really, really important to us right now. You 
> will also find an appeal for help and advice. You may be amused or 
> horrified at my attempt at the bottom of the email to justify the 
> answer we want to hear despite the obvious objections.
>
> We are implementing products where the underlying data are exposed as 
> a web of resources accessed via HTTP. Our clients are implemented 
> using HTML and JavaScript in an AJAX style. Since both our data and 
> our UI are now on the web, we have the problem of how to relate the 
> two. We are aware that others have written about this topic – for 
> example we are aware of this document: 
> http://www.w3.org/TR/2008/NOTE-cooluris-20080331/. Unfortunately, the 
> guidance we are finding is not proving entirely satisfactory or 
> relevant to our situation, which is described below using examples 
> from GMail and YouTube. We're not implementing email or video-sharing 
> products at IBM Rational, but the parallel to our own products is 
> close enough to illustrate the point.
>
> The base URL for Gmail is http://mail.google.com/mail/ which appears 
> to redirect to http://mail.google.com/mail/#inbox. Within your inbox, 
> you can click on an email - if you do, Gmail will open your email and 
> the browser address bar will change to something like this: 
> http://mail.google.com/mail/#inbox/11f804dfae358bd9. An improbable 
> number of POSTs and GETs go on under the covers before this URL 
> appears and none of them would make you expect that this URL would 
> appear, but somehow it does - GMail is not simple. Security will 
> hopefully stop you from following this link to this email, but I can 
> do it. So GMail provides me with URLs for each of my emails of the 
> form http://mail.google.com/mail/#inbox/11f804dfae358bd9, and it makes 
> those URLs appear in the address field, which is where users would 
> expect they would appear. That is fine if I'm a human that wants to 
> interact with GMail, but what if I'm a client that wants to get at the 
> email itself, not the GMail UI for the email? The products we are 
> working on must support both scenarios. One option that GMail could 
> implement is to offer a "link" button like the one in Google Maps that 
> Noah brought to my attention, but instead of putting the "UI url" in 
> there, it could put the "data url". In fact, YouTube does something 
> close to this - look at the content of the "embed" field on a YouTube 
> page - it includes the URL of the video separate from the URL of the 
> YouTube page that embeds the video.
>
> Just for the sake of an example, lets assume we, and GMail, did like 
> YouTube does, and assume the matching "data url" for the email above 
> is http://mail.google.com/11f804dfae358bd9. Am I now in good shape? 
> From one point of view, it's not bad, because I have both URLs for my 
> email, one for a UI for humans using a browser and a second one for 
> other purposes. If I can remember which URL is for which purpose, 
> always use the right one at the right time, always email both of them 
> to others, so they can do the same and so on, then it works. Not only 
> is this a pain, but uncaught mistakes will have negative consequences, 
> like defeating searches if the wrong URL is stored in data. Much 
> simpler would be to have a single URL that just always did the right 
> thing. This is why the pattern documented here is attractive: 
> http://www.w3.org/TR/2008/NOTE-cooluris-20080331/#r303gendocument. If 
> we took this approach, we would only need the “data URL” - 
> http://mail.google.com/11f804dfae358bd9 in the GMail example. If I 
> pasted that into my browser, content negotiation could get back the 
> same HTML that is returned by real GMail URL. On the other hand if I 
> gave the URL to some other sort of program that wanted an RDF 
> repesentation or an XML representation, content-negotiation would 
> again give the right thing. This is a huge improvement in usability of 
> my solution.
>
> So why don't we just implement thisdesign? The objection, pointed out 
> by several of our developers (and me) is that it's a distortion to say 
> that the GMail HTML returned by 
> http://mail.google.com/mail/#inbox/11f804dfae358bd9 is a 
> representation of the email. It's more reasonable to think of it as a 
> JavaScript program that turns around and does a bunch of further GETs 
> and POSTs in whose responses are somewhere buried a representation of 
> the email. I'm guessing this is why the authors of the paper cited 
> above advised against using content negotiation for this case - it 
> seems like a hack that is not in the spirit of the web architecture.
>
> The solution we are considering – and that we’d like some feedback on 
> – is to use content-negotiation despite the objections. This design 
> has by far the best characteristics from a user perspective. If we had 
> less delicate design sensitivities, we’d probably just implement this 
> and not worry about justifying it - perhaps we are blind to problems 
> this will cause later. Rather than pick a different design with worse 
> user characteristics in order to fit the classic model, we choose 
> instead to invent a justification for why it’s ok, as follows.
>
> “HTML started life as a language for representations of web documents. 
> Browsers were user agents that took HTML representations of web 
> documents, displayed them to users and allowed then to navigate the 
> web. This is still the basis of much of the web. Over time, with the 
> addition of forms, JavaScript and AJAX, HTML acquired the capabilities 
> of a full programming language and the browsers acquired the 
> characteristics of a programmable run-time environment. Many modern 
> HTML response documents are no longer representations of anything that 
> is meaningful to users. Instead of being representations of resources 
> that are interpreted by the browser acting as a user agent, these HTML 
> documents are implementations of specialized user agents that execute 
> in the browser as a run-time platform. Given that HTML and the browser 
> now have two distinct meanings and roles – 1) document 
> representations/user agents and 2) implementations of specialized user 
> agents/run-time platforms – we permit our servers to take a more 
> liberal view of the meaning of an HTTP GET when the accept header 
> includes text/html. Our server may either return an HTML 
> representation of the requested document, or it may return the 
> implementation of a specialized user agent implemented in HTML for 
> that resource.”
>
> Please advise us. Is there another technical approach that we should 
> consider that has attractive characterisitcs for users? Is there a 
> better way of rationalizing the design choice that appears to work 
> best operationally?
>
> Best regards, Martin
>
> Martin Nally, IBM Fellow
> CTO, IBM Rational
> tel: (949)544-4691
>
>
> www-tag-request@w3.org wrote on 02/18/2009 09:48:00 AM:
>
> > Jonathan, you said
> >
> > "I would think that CN is used (and intended to be used) not just for
> > choosing between semantically equivalent entities, but also for
> > semantic subsetting, such as abbreviated representations for mobile
> > devices, low-resolution displays, audio vs. written, etc. Subsetting
> > is certainly *not* equivalence."
> >
> > So, not equivalence but derived from?  I'm wondering how far we can 
> push this.
> > Can CN we used to select say between a picture of a house and a text
> > description?
> > I was told NO, but perhaps we are rethinking this.
> >
> > All the best, Ashok
> >
> >
> > Jonathan Rees wrote:
> > > I started to turn this into a request for TAG telecon agendum, and got
> > > stuck on the word "equivalent".
> > >
> > > Just to make sure I understand you - by "equivalent" are you referring
> > > to HTTP 2616 section 13.3.3:
> > >
> > >    Entity tags are normally "strong validators," but the protocol
> > >    provides a mechanism to tag an entity tag as "weak." One can 
> think of
> > >    a strong validator as one that changes whenever the bits of an 
> entity
> > >    changes, while a weak value changes whenever the meaning of an 
> entity
> > >    changes. Alternatively, one can think of a strong validator as part
> > >    of an identifier for a specific entity, while a weak validator is
> > >    part of an identifier for a set of semantically equivalent 
> entities.
> > >
> > > and are you specifically asking about the use of entity tags?  Or were
> > > you really asking the broader question about the use of CN that people
> > > like me were eager to answer? Because I think these are two different
> > > questions.
> > >
> > > If you're asking for advice on "good practice" around the use of
> > > entity tags, the only example given in RFC 2616 is that of hit
> > > counters, which seems quite a long way from "semantic equivalence" of
> > > an image and some RDF. I'd be surprised if anyone would argue in favor
> > > of allowing a cached PNG to be returned when RDF was available and
> > > preferred. On the other hand, the question of under which
> > > circumstances (if any) you are advised to use CN to choose between PNG
> > > and RDF has a very different character. Perhaps some server software
> > > has chosen to assume co-representations are equivalent for caching
> > > purposes, but if this is allowed by RFC 2616 I'd be very interested to
> > > hear the argument.
> > >
> > > I would think that CN is used (and intended to be used) not just for
> > > choosing between semantically equivalent entities, but also for
> > > semantic subsetting, such as abbreviated representations for mobile
> > > devices, low-resolution displays, audio vs. written, etc. Subsetting
> > > is certainly *not* equivalence.
> > >
> > > Obviously there is appeal to a slippery adjective "semantic", which
> > > you're never going to pin down in a manner that is both rigorous and
> > > general, but you could legitimately ask someone to list some positive
> > > and negative examples and situations where differences between
> > > representations might or might not matter to users and/or
> > > applications.
> > >
> > > Jonathan
> > >
> > > On Thu, Feb 12, 2009 at 7:29 AM, Michael Hausenblas
> > > <michael.hausenblas@deri.org> wrote:
> > >  
> > >> Dear TAG members, dear subscribers,
> > >>
> > >> I would like to ask you about your opinion on the following
> > scenario. Please
> > >> note that (1) though I'm a member of the W3C Media Fragments WG 
> Ispeak only
> > >> for myself, and (2) that all URIs used in the following are 
> dereferenceable
> > >> and made out of 100% recycled electrons.
> > >>
> > >> Given three URIs, namely,
> > >>
> > >> <http://sw-app.org/sandbox/house>
> > >>
> > >> <http://sw-app.org/sandbox/house.png>
> > >>
> > >> <http://sw-app.org/sandbox/house.ttl>
> > >>
> > >> is it 'allowed' (that is, does it break the Web architecture) if 
> one does
> > >> the following:
> > >>
> > >> $curl -I -H "Accept: image/png" http://sw-app.org/sandbox/house
> > >> HTTP/1.1 200 OK
> > >> Date: Thu, 12 Feb 2009 12:12:39 GMT
> > >> Server: Apache/2.2.3 (CentOS)
> > >> Content-Location: house.png
> > >> Vary: negotiate,accept
> > >> TCN: choice
> > >> Last-Modified: Thu, 12 Feb 2009 11:54:07 GMT
> > >> ETag: "5c0fd-2deb-462b760a7f5c0;462b77ce8a040"
> > >> Accept-Ranges: bytes
> > >> Content-Length: 11755
> > >> Connection: close
> > >> Content-Type: image/png
> > >>
> > >> $ curl -I -H "Accept: text/turtle" http://sw-app.org/sandbox/house
> > >> HTTP/1.1 200 OK
> > >> Date: Thu, 12 Feb 2009 12:13:01 GMT
> > >> Server: Apache/2.2.3 (CentOS)
> > >> Content-Location: house.ttl
> > >> Vary: negotiate,accept
> > >> TCN: choice
> > >> Last-Modified: Thu, 12 Feb 2009 11:54:06 GMT
> > >> ETag: "5c0fc-173-462b76098b380;462b77ce8a040"
> > >> Accept-Ranges: bytes
> > >> Content-Length: 371
> > >> Connection: close
> > >> Content-Type: text/turtle
> > >>
> > >> Please note that I don't ask if this works. It does. Obviously. The
> > >> question, to put it in other words, is: is the PNG 
> *representation* derived
> > >> via conneg from the generic resource 
> <http://sw-app.org/sandbox/house>
> > >> equivalent to the RDF in Turtle?
> > >>
> > >> If not, why not? If it is, can you please point me to a finding, 
> note, a
> > >> specification, etc. that 'normatively' defines what 
> 'equivalency'really is?
> > >>
> > >> Cheers,
> > >>      Michael
> > >>
> > >> --
> > >> Dr. Michael Hausenblas
> > >> DERI - Digital Enterprise Research Institute
> > >> National University of Ireland, Lower Dangan,
> > >> Galway, Ireland, Europe
> > >> Tel. +353 91 495730
> > >> http://sw-app.org/about.html
> > >>    
> > >
> > >  
> >
>
Received on Wednesday, 18 February 2009 20:42:23 UTC