What resource does this URL identify? - http://mail.google.com/mail/#inbox/11f804dfae358bd9 from Martin Nally on 2009-02-24 (www-tag@w3.org from February 2009)

From: Martin Nally <nally@us.ibm.com>
Date: Mon, 23 Feb 2009 23:52:16 -0500
To: www-tag@w3.org
Message-ID: <OF2B676D83.2F32C637-ON85257567.0006A603-85257567.001AC217@us.ibm.com>
I was hoping for some guidance from this group on the basic web design
question below. Perhaps what I'm asking for is unfair - you are not a free
web app design service, after all, but I hoped I was asking a question that
would interest you. Perhaps I asked too long a question - let me try a
shorter version.

Lots of people are writing AJAX user interfaces that execute in browsers.
GMail is a well known one. GMail exposes urls that look like this:
http://mail.google.com/mail/#inbox/11f804dfae358bd9. Entering this URL in a
browser will open GMail on a particular email in my inbox (if you are
logged on as me). Our experience is that when writing AJAX user interfaces
this pattern of URL becomes common.  Parsing this URL,
http://mail.google.com/mail/ is the URL of the email client.
inbox/11f804dfae358bd9 identifies the "view" and the email the client
should initialize on. It seems simple enough, although you might ask
yourself what resource this URL really identifies. Imagine for the purposes
of my example that http://mail.google.com/11f804dfae358bd9 is the URL of
the email - Google doesn't really expose this URL.

Our problem is that we have lots of "data URLs" like
http://mail.google.com/11f804dfae358bd9, all linked to each other - they
form our "data web". In our case the resources are not emails, they are
resources for our domain. If you get a hold of one of these URLs, you can
always ask it for a representation in the form of RDF, JSON, XML etc. But
if you are a human, what you really want to do is to get back into the
appropriate AJAX client program initialized on the resource. All you have
is the data URL, you don't yet know the media type, so you can't guess the
URL of the right client program. We can see 4 options:

1) Try to make sure that humans never see data URLs like
http://mail.google.com/11f804dfae358bd9 - they only see UI URLs like
http://mail.google.com/mail/#inbox/11f804dfae358bd9. We thought about this
but it seems impossible.
2) Provide humans with some sort of algorithm for converting
http://mail.google.com/11f804dfae358bd9 to
http://mail.google.com/mail/#inbox/11f804dfae358bd9. We thought about this
too, and it seems like a mess.
3) Let content negotiation on http://mail.google.com/11f804dfae358bd9

return the right thing. There seem to be two obvious versions of this:
      a) Use a redirect to
http://mail.google.com/mail/#inbox/11f804dfae358bd9. Now the users are
exposed to both URLs and it is sure and certain that they will use the
wrong URL next time they want to refer to the email, thus messing up the
data web
      b) Return the email client with some trickery to initialize on email
11f804dfae358bd9. This works BEAUTIFULLY, but it seems a bit of a hack from
a web architecture point of view

So the only solution that seems to work looks like a hack. What do you
recommend? The TAG issue is that the standard web model does not seem
adequate to explain or give guidance to data webs with AJAX clients.

If you have patience, here is another thought on this problem. In the
standard web model, a conceptually clean approach would be to write the
clients as browser plug-ins instead of as AJAX clients. Of course, nobody
wants to either write or use plug-ins for this purpose, but you might think
of the AJAX clients as being the moral equivalent of browser plug-ins
except they are implemented in DHTML. The problem is that the standard
model (or browser reality) doesn't give us a clean way of integrating these
"DHTML plug-ins" into the normal request/response flow.

Best regards, Martin

Martin Nally, IBM Fellow
CTO, IBM Rational
tel: (949)544-4691


Martin Nally/Raleigh/IBM wrote on 02/20/2009 11:39:10 AM:

> I sent a version of this note 2 days ago, but it seems to have been
> caught in the anti-spam filters. My apologies if you end up seeing
> this twice. I also apologize in advance for the length of this email
> - this is especially rude since I'm new to this forum. Noah and
> Ashok know me personally, but for those who don't, I work in IBM as
> the CTO of the Rational software brand which develops products and
> services to support our customers' software development needs.
>
> The recent exchange on the boundaries of content negotiation is very
> near to a question we are stuggling with in IBM. You won't find any
> brilliant insights from me to answer the question, but you will find
> an explanation of why this question seems really, really important
> to us right now. You will also find an appeal for help and advice.
> You may be amused or horrified at my attempt at the bottom of the
> email to justify the answer we want to hear despite the obvious
objections.
>
> We are implementing products where the underlying data are exposed
> as a web of resources accessed via HTTP. Our clients are implemented
> using HTML and JavaScript in an AJAX style. Since both our data and
> our UI are now on the web, we have the problem of how to relate the
> two. We are aware that others have written about this topic – for
> example we are aware of this document: http://www.w3.

> org/TR/2008/NOTE-cooluris-20080331/. Unfortunately, the guidance we
> are finding is not proving entirely satisfactory or relevant to our
> situation, which is described below using examples from GMail and
> YouTube.  We're not implementing email or video-sharing products at
> IBM Rational, but the parallel to our own products is close enough
> to illustrate the point.
>
> The base URL for Gmail is http://mail.google.com/mail/ which appears
> to redirect to http://mail.google.com/mail/#inbox. Within your
> inbox, you can click on an email - if you do, Gmail will open your
> email and the browser address bar will change to something like this:
> http://mail.google.com/mail/#inbox/11f804dfae358bd9. An improbable
> number of POSTs and GETs go on under the covers before this URL
> appears and none of them would make you expect that this URL would
> appear, but somehow it does - GMail is not simple. Security will
> hopefully stop you from following this link to this email, but I can
> do it. So GMail provides me with URLs for each of my emails of the form
> http://mail.google.com/mail/#inbox/11f804dfae358bd9, and it makes
> those URLs appear in the address field, which is where users would
> expect they would appear. That is fine if I'm a human that wants to
> interact with GMail, but what if I'm a client that wants to get at
> the email itself, not the GMail UI for the email? The products we
> are working on must support both scenarios. One option that GMail
> could implement is to offer a "link" button like the one in Google
> Maps that Noah brought to my attention, but instead of putting the
> "UI url" in there, it could put the "data url". In fact, YouTube
> does something close to this - look at the content of the "embed"
> field on a YouTube page - it includes the URL of the video separate
> from the URL of the YouTube page that embeds the video.
>
> Just for the sake of an example, lets assume we, and GMail, did like
> YouTube does, and assume the matching "data url" for the email above is
> http://mail.google.com/11f804dfae358bd9. Am I now in good shape?
> From one point of view, it's not bad, because I have both URLs for
> my email, one for a UI for humans using a browser and a second one
> for other purposes. If I can remember which URL is for which
> purpose, always use the right one at the right time, always email
> both of them to others, so they can do the same and so on, then it
> works. Not only is this a pain, but uncaught mistakes will have
> negative consequences, like defeating searches if the wrong URL is
> stored in data or text. Much simpler would be to have a single URL
> that just always did the right thing. This is why the pattern
> documented here is attractive:  http://www.w3.org/TR/2008/NOTE-

> cooluris-20080331/#r303gendocument. If we took this approach, we
> would only need the “data URL” - http://mail.google.com/11f804dfae358bd9

> in the GMail example. If I pasted that into my browser, content
> negotiation could get back the same HTML that is returned by real
> GMail URL. On the other hand if I gave the URL to some other sort of
> program that wanted an RDF repesentation or an XML representation,
> content-negotiation would again give the right thing. This is a huge
> improvement in usability of my solution.
>
> So why don't we just implement thisdesign? The objection, pointed
> out by several of our developers (and me) is that it's a distortion
> to say that the GMail HTML returned by  http://mail.google.

> com/mail/#inbox/11f804dfae358bd9 is a representation of the email.
> It's more reasonable to think of it as a JavaScript program that
> turns around and does a bunch of further GETs and POSTs in whose
> responses are somewhere buried a representation of the email. I'm
> guessing this is why the authors of the paper cited above advised
> against using content negotiation for this case - it seems like a
> hack that is not in the spirit of the web architecture.
>
> The solution we are considering – and that we’d like some feedback
> on – is to use content-negotiation despite the objections. This
> design has by far the best characteristics from a user perspective.
> If we had less delicate design sensitivities, we’d probably just
> implement this and not worry about justifying it - perhaps we are
> blind to problems this will cause later. Rather than pick a
> different design with worse user characteristics in order to fit the
> classic model, we choose instead to invent a justification for why
> it’s ok, as follows.
>
> “HTML started life as a language for representations of web
> documents. Browsers were user agents that took HTML representations
> of web documents, displayed them to users and allowed then to
> navigate the web. This is still the basis of much of the web. Over
> time, with the addition of forms, JavaScript and AJAX, HTML acquired
> the capabilities of a full programming language and the browsers
> acquired the characteristics of a programmable run-time environment.
> Many modern HTML response documents are no longer representations of
> anything that is meaningful to users. Instead of being
> representations of resources that are interpreted by the browser
> acting as a user agent, these HTML documents are implementations of
> specialized user agents that execute in the browser as a run-time
> platform. Given that HTML and the browser now have two distinct
> meanings and roles – 1) document representations/user agents and 2)
> implementations of specialized user agents/run-time platforms – we
> permit our servers to take a more liberal view of the meaning of an
> HTTP GET when the accept header includes text/html. Our server may
> either return an HTML representation of the requested document, or
> it may return the implementation of a specialized user agent
> implemented in HTML for that resource.”
>
> Please advise us. Is there another technical approach that we should
> consider that has attractive characterisitcs for users? Is there a
> better way of rationalizing the design choice that appears to work
> best operationally?

> Best regards, Martin
>
> Martin Nally, IBM Fellow
> CTO, IBM Rational
> tel: (949)544-4691
Received on Tuesday, 24 February 2009 14:12:26 UTC