Using Content Negotiation to relate "data resources" to AJAX user interfaces


I sent a version of this note 2 days ago, but it seems to have been caught
in the anti-spam filters. My apologies if you end up seeing this twice. I
also apologize in advance for the length of this email - this is especially
rude since I'm new to this forum. Noah and Ashok know me personally, but
for those who don't, I work in IBM as the CTO of the Rational software
brand which develops products and services to support our customers'
software development needs.

The recent exchange on the boundaries of content negotiation is very near
to a question we are stuggling with in IBM. You won't find any brilliant
insights from me to answer the question, but you will find an explanation
of why this question seems really, really important to us right now. You
will also find an appeal for help and advice. You may be amused or
horrified at my attempt at the bottom of the email to justify the answer we
want to hear despite the obvious objections.

We are implementing products where the underlying data are exposed as a web
of resources accessed via HTTP. Our clients are implemented using HTML and
JavaScript in an AJAX style. Since both our data and our UI are now on the
web, we have the problem of how to relate the two. We are aware that others
have written about this topic – for example we are aware of this document:
http://www.w3.org/TR/2008/NOTE-cooluris-20080331/. Unfortunately, the
guidance we are finding is not proving entirely satisfactory or relevant to
our situation, which is described below using examples from GMail and
YouTube.  We're not implementing email or video-sharing products at IBM
Rational, but the parallel to our own products is close enough to
illustrate the point.

The base URL for Gmail is http://mail.google.com/mail/ which appears to
redirect to http://mail.google.com/mail/#inbox. Within your inbox, you can
click on an email - if you do, Gmail will open your email and the browser
address bar will change to something like this:
http://mail.google.com/mail/#inbox/11f804dfae358bd9. An improbable number
of POSTs and GETs go on under the covers before this URL appears and none
of them would make you expect that this URL would appear, but somehow it
does - GMail is not simple. Security will hopefully stop you from following
this link to this email, but I can do it. So GMail provides me with URLs
for each of my emails of the form
http://mail.google.com/mail/#inbox/11f804dfae358bd9, and it makes those
URLs appear in the address field, which is where users would expect they
would appear. That is fine if I'm a human that wants to interact with
GMail, but what if I'm a client that wants to get at the email itself, not
the GMail UI for the email? The products we are working on must support
both scenarios. One option that GMail could implement is to offer a "link"
button like the one in Google Maps that Noah brought to my attention, but
instead of putting the "UI url" in there, it could put the "data url". In
fact, YouTube does something close to this - look at the content of the
"embed" field on a YouTube page - it includes the URL of the video separate
from the URL of the YouTube page that embeds the video.

Just for the sake of an example, lets assume we, and GMail, did like
YouTube does, and assume the matching "data url" for the email above is
http://mail.google.com/11f804dfae358bd9. Am I now in good shape? From one
point of view, it's not bad, because I have both URLs for my email, one for
a UI for humans using a browser and a second one for other purposes. If I
can remember which URL is for which purpose, always use the right one at
the right time, always email both of them to others, so they can do the
same and so on, then it works. Not only is this a pain, but uncaught
mistakes will have negative consequences, like defeating searches if the
wrong URL is stored in data or text. Much simpler would be to have a single
URL that just always did the right thing. This is why the pattern
documented here is attractive:
http://www.w3.org/TR/2008/NOTE-cooluris-20080331/#r303gendocument. If we
took this approach, we would only need the “data URL” -
http://mail.google.com/11f804dfae358bd9 in the GMail example. If I pasted
that into my browser, content negotiation could get back the same HTML that
is returned by real GMail URL. On the other hand if I gave the URL to some
other sort of program that wanted an RDF repesentation or an XML
representation, content-negotiation would again give the right thing. This
is a huge improvement in usability of my solution.

So why don't we just implement thisdesign? The objection, pointed out by
several of our developers (and me) is that it's a distortion to say that
the GMail HTML returned by
http://mail.google.com/mail/#inbox/11f804dfae358bd9 is a representation of
the email. It's more reasonable to think of it as a JavaScript program that
turns around and does a bunch of further GETs and POSTs in whose responses
are somewhere buried a representation of the email. I'm guessing this is
why the authors of the paper cited above advised against using content
negotiation for this case - it seems like a hack that is not in the spirit
of the web architecture.

The solution we are considering – and that we’d like some feedback on – is
to use content-negotiation despite the objections. This design has by far
the best characteristics from a user perspective. If we had less delicate
design sensitivities, we’d probably just implement this and not worry about
justifying it - perhaps we are blind to problems this will cause later.
Rather than pick a different design with worse user characteristics in
order to fit the classic model, we choose instead to invent a justification
for why it’s ok, as follows.

“HTML started life as a language for representations of web documents.
Browsers were user agents that took HTML representations of web documents,
displayed them to users and allowed then to navigate the web. This is still
the basis of much of the web. Over time, with the addition of forms,
JavaScript and AJAX, HTML acquired the capabilities of a full programming
language and the browsers acquired the characteristics of a programmable
run-time environment. Many modern HTML response documents are no longer
representations of anything that is meaningful to users. Instead of being
representations of resources that are interpreted by the browser acting as
a user agent, these HTML documents are implementations of specialized user
agents that execute in the browser as a run-time platform. Given that HTML
and the browser now have two distinct meanings and roles – 1) document
representations/user agents and 2) implementations of specialized user
agents/run-time platforms – we permit our servers to take a more liberal
view of the meaning of an HTTP GET when the accept header includes
text/html. Our server may either return an HTML representation of the
requested document, or it may return the implementation of a specialized
user agent implemented in HTML for that resource.”

Please advise us. Is there another technical approach that we should
consider that has attractive characterisitcs for users? Is there a better
way of rationalizing the design choice that appears to work best
operationally?

Best regards, Martin

Martin Nally, IBM Fellow
CTO, IBM Rational
tel: (949)544-4691

Received on Friday, 20 February 2009 16:41:02 UTC