Scope of the Web: what protocols are allowed? from noah_mendelsohn@us.ibm.com on 2003-11-05 (www-tag@w3.org from November 2003)

From: <noah_mendelsohn@us.ibm.com>
Date: Wed, 5 Nov 2003 13:55:54 -0500
To: www-tag@w3.org
Message-ID: <OF2119936B.3BB2D09A-ON85256DC2.007EB732@lotus.com>
I would like to congratulate the Tag on the progress that's been made on 
the Architecture Document [1].  There are some issues that have been on my 
mind for awhile, and although recent versions have answered some of my 
concerns, this may be the time to raise a few key questions that remain.

I assume it's reasonable to expect the Architecture document to answer the 
question:  "Is protocol X at least broadly compatible with the Web 
Architecture?"  For example, if I describe the implementation of some 
distributed protocol, one should be able to say either "Yes,  you're using 
URIs in the right way, you've created or used schemes in the appropriate 
manner, etc., so your new protocol can be viewed as part of the web."  vs. 
"No, you don't even try to name resources with URIs, which is the most 
basic requirement of the Web architecture, so as described your system is 
clearly not part of the Web and does not in at least some important 
respects conform to Web architecture." 

The concern I have is that I'm not sure the current document answers these 
questions with respect to some use cases that have occurred to me.  So, 
here are some thought experiments in protocol and Web design:  I'd be 
curious to hear (a) the degree to which the following should be viewed as 
conforming to Web architecture and whatever the answer (b) whether the 
current Web Architecture gives the necessary guidance to lead to the 
intended answer.

Example 1:  Representations visible at the API but not on the wire
==================================================================

The purpose of this example is to explore the degree to which Web 
architecture refers to wire formats vs. application models.

I'll call this mythical system "spread spectrum peer to peer".  In rough 
outline, I document and implement a new scheme I'll call SPREADP2P:.  This 
is a distributed store implementing a hierarchical data space, with URIs 
naming resources using the new SPREADP2P: scheme.  The protocol 
documention indicates that the system is implemented as a peer-to-peer 
distributed system, with the active nodes conspiring to implement the 
store in a distributed manner.  As seen at APIs at the nodes, the system 
implements the familiar GET, PUT, POST and DELETE operations, with 
semantics similar to HTTP. 

On the wire, the system looks very different.  Without going into details 
(this is a mythical system), the system uses techniques closely related to 
spread spectrum.  No particular transmitted packet or on disk fragment at 
the peers directly represents an identifiable representation of any 
particular resource.  The spread spectrum techniques exchange packets that 
essentially multiplex fragments of the state of multiple representations 
of multiple resources in each packet.  Informally, the data from various 
resources is hashed together and spread around in ways that make it very 
difficult at any one place to reconstruct any particular part of a 
resource or representation, but suitable queries can fan out through the 
network gathering information that with high probability will be 
sufficient to reconstruct any desired representation.

Let's not get into the design details, as only a few characteristics are 
important for this thought experiment.  The point is to describe a system 
in which the application model is clearly REST, but unlike HTTP, the on 
the wire traffic is only very indirectly related to the transfer of 
represenations of any particular resources.   I'm not asking whether this 
is a good way to build a system.  I'm merely asking what the architecture 
document would say about it if I did. 

I'll say that I hope the answer is that the Web architecture is scaleable 
to include systems of this sort, because the useage scenarios are 
compelling.  If I add a handler for this new scheme to my browser, then I 
can transparently browse through resources that are managed by HTTP 
interlinking with resources stored in the new system.  Stated another way, 
I think that the architecture document should be a little more careful 
about dealing with possible new protocols and schemes, and in particular 
should separate concerns relating to abstract models from concerns 
relating to the architecture of on-the-wire exchanges. 

Example 2: Less RESTful models
==============================

This example is probably less architecturally radical.   I invent a new 
protocol for controlled transmission of video streams, and in an attempt 
to integrate with the Web I assign the new VIDSTREAM: scheme.  Indeed, I 
use some mechanisms of the Web in a first class manner:  I assign a URI in 
the new scheme not only to each stream (e..g. movie) but in fact to each 
frame of the video, a separate one for the Spanish-language audio, etc. 
For better or worse, however, I part company with traditional REST in 
other details of the protocol.  Indeed, the protocol consists of a single 
full duplex TCP stream in which the receiver is continually (or at least 
asynchronously) streaming control commands to the server while multiple 
streams comprising the movie, it's audio, subtitles,  and other control 
information are streaming back in interleaved form.  Examples of control 
information include requests to reduce the resolution of subsequent 
frames, to seek to a new frame (identified by its own URI or perhaps by a 
fragID in the base URI of the movie...I'll punt on media type issues to 
keep this simple),  commands to start sending audio in Spanish, etc.  No 
doubt this system could in principle be built in a more traditional REST 
manner, perhaps even on HTTP.   I could define additional resources for 
the video controllers and view the requests to speed up or change 
resolution as POSTs to those resources, but for better or worse this 
mythical protocol doesn't work that way:  retrieval requests are not 
obviously RESTful, but are encoded as control packets that sometimes but 
not always reference URIs. 

This example is a little different from the first one.  In some sense, it 
seems more traditional,  since at the coarsest level I could imagine 
clicking on a URI link and seeing what is roughly a request for a 
representation of the resource go out from my client.   On the other hand, 
particularly with respect to the upstream traffic but to some degree in 
both directions, the system is not documented in terms of transfer of 
representations.  It's documented as a set of control packets, 
interspersed frames representing video and audio (and as noted the audio 
may be a separate resource with its own URI), and so on.

As above I ask:  what does the Web Arch document say about this mythical 
example?  Is it within the scope of the definition of the Web as provided 
in the Architecture document?

======End of Examples=========

Please forgive the somewhat contrived examples.  While I know that each is 
somewhere between toy and artificial in important respects, I think they 
do represent in schematic form the sorts of things people will want to do 
beyond what HTTP does today.  Since the Architecture Document sets out the 
scope of the Web in terms of concepts like transfer of representations, I 
think it is useful to understand what it would say about these examples. 
My reading of the current editors' draft is that it comes close to 
allowing for them, but there are also statements such as [2]:

        "Web agents communicate complete or 
         partial information about the state 
         of a resource through 
         representations. "

...that could be taken to preclude either or both of the examples above.

Please also accept my apologies for raising these examples relatively late 
in your cycle of work on the Arch. document.  I have for months intended 
to get to these and never managed to.  If time pressures dictate that 
these cannot be considered in detail at this late date, I will completely 
understand.  I do think they are crucially important in principle, and 
that in any case the Architecture document should at least be clear on the 
degree to which they fit Web architecture.

FWIW, those who attended last year's tech plenary may remember that I 
presented some of my own ideas on how these concerns might be tackled, 
with a more layered notion of what it means to be "on the Web" [3].  I 
doubt the Tag will want to buy into such layering at this point, but it 
does seem a sensible approach to me.  Perhaps it would be worth at least 
giving some thought to which of the layers described in my presentation 
comes closest to capturing the scope of the Web as the Tag intends to 
describe it. 

Again, my congratulations to you on the wonderful progress you've made on 
the Architecture Document so far.  Any attention to these questions will 
be much appreciated. Thank you.

Noah

[1] http://www.w3.org/2001/tag/webarch/
[2] http://www.w3.org/2001/tag/webarch/#intro
[3] http://www.w3.org/2003/Talks/techplen-ws/w3cplenaryhowmanywebs.htm

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Wednesday, 5 November 2003 13:56:53 UTC