Re: Intermediaries and XML Protocol from Mark Nottingham on 2001-02-09 (xml-dist-app@w3.org from February 2001)

From: Mark Nottingham <mnot@akamai.com>
Date: Thu, 8 Feb 2001 17:53:50 -0800
To: Mark Baker <mark.baker@canada.sun.com>
Cc: XML Distributed Applications List <xml-dist-app@w3.org>
Message-ID: <20010208175342.A28901@akamai.com>
Mark,

Thanks for the comments, I very much appreciate it. Responses inline.


On Thu, Feb 08, 2001 at 02:35:46PM -0500, Mark Baker wrote:
> 
> (BTW, the attachment was text/plain even though the content was
> XHTML.)

Yes; have (hopefully) fixed that now. Mutt wasn't picking up the
system's mime.types for some reason...


> >On the other hand, intermediaries were retrofitted into HTTP to
> >allow the Web to scale more efficiently. Originally, the protocol
> >required clients to contact servers directly to satisfy each
> >request. When the Web experienced unprecidented growth, servers
> >and the network infrastructure could not scale quickly enough to
> >satisfy demand. As a result, HTTP/1.0[XX] introduced
> >intermediaries (proxies and gateways), which could take advantage
> >of locality in requests to cache the responses. Although more
> >intermediary-related functionality was added in HTTP/1.1[XX], the
> >continued growth of the Web sparked the development of further
> >measures; surrogates[XX] (informally known as "reverse proxies"),
> >often deployed in "Content Delivery Networks."[XX]</p>
> 
> I would suggest that it's hard to claim that any feature could be
> retrofitted into a 1.0 version of anything.  Versions before 1.0
> are generally considered incomplete.

I agree, but by the time 1.0 was published, the Web had sufficiently
grown to rule out a 'fresh start'.


> >The contrast between these examples bears examination. Because an
> >intermediary model was designed into SMTP, its intermediaries
> >perform in a well-defined manner, and are easy to interpose into
> >the message path. On the other hand, HTTP has ongoing problems
> >caused by the interposition of intermediaries; intermediaries do
> >not always have a clear understanding of message semantics[XX],
> 
> How so?  I can understand this for RPC-over-HTTP solutions, but not
> for normal uses of HTTP.  What is that missing reference?

IIRC, I was thinking of the people who are working on extensible
proxies, and their desire to be able to determine the appropriate
actions to take based on message semantics. See:
  http://www.extproxy.org/
  http://www.i-cap.org/

I probably need to add to this, to include other problems, such as
enumerated in 
  http://www.wrec.org/Drafts/draft-ietf-wrec-known-prob-03.txt


> >and location of an appropriate intermediary is problematic[XX].
> 
> Right, though I don't believe that's such a big deal.

I'd argue that; in many networks (especially the enterprise),
locating the proper intermediary is non-trival, considering network
layout, the intermediaries available, origin server location, and
nature of the content. This is highlighted in the IETF WREC WG's
output, and is the basis of one of the work items in the WEBI WG.


> >Although the most successful intermediary models tend to be in
> >application-specific protocols (such as DNS, NTP, etc.), it is
> >possible to do so in a transfer protocol as well.
> 
> I'm unclear what you mean here.  HTTP is both an application-specific
> protocol and a transfer protocol.

I had difficulty finding references on the nature of transport
protocols, so I'm trying to figure out the best ways to differentiate
application-layer protocols.

That having been said, I see HTTP as a transfer protocol; its
application is "the Web", which is used in a variety of ways -
although the name includes "hypertext", it is also used for such
things as file transfer, as an RPC (with or without the formalized
semantics of something like XML Protocol), and streaming media (for
better or worse). The nature of client and server implementations
varies widely in their interface, purpose and so forth. The abstract
of HTTP/1.1 seems to back this view up.


> >While protocols often define semantics to allow limited processing
> >by intermediaries (for such things as message caching,
> >timestamping and routing), they generally are either very
> >application-specific and well-defined behaviors (SMTP routing), or
> >weak, advisory controls (HTTP caching). Recently, there has been
> >work to retrofit a more capable processing model into HTTP
> >[XX][XX]. Unfortunately, it faces a number of problems due to the
> >fact that HTTP is already widely deployed.</p>
> >
> >Message processing by intermediaries that do not act on behalf of
> >either the message sender or reciever may introduce privacy,
> >security and integrity concerns, as they are capable of examining
> >and modifying the message without the knowledge of either party.
> 
> Right, but that's the point.  HTTP's trust model is one where
> explicit trust is necessary for composing the chain.  The term
> "weak" above appears to suggest that this is necessarily something
> that needs improving, but I don't believe that's the case.  

In HTTP's case, most people I know consider caching nearly useless at
best, and a serious impediment to a site's functioon at worst,
because there is no trust model; content providers are unwilling to
trust access providers, and access providers don't have the best
interests of either the end users or content providers in mind when
they deploy caches. As a result, content providers either don't give
cacheability information at all, or they make their objects
uncacheable. Because the HTTP allows use of an unspecified heuristic
to determine cacheability, each cache vendor's product behaves
differently, causing more loss of trust.


> Or more precisely, I don't think it needs improving to solve the
> problems HTTP was designed to solve.

Regarding a trust model, I very much agree. Unfortunately, many
people are trying to make the HTTP do more; see the proposed OPES WG
in the IETF <http://www.extproxy.org/>


> >XML Protocol is also somewhat unique in that it is an explicitly
> >layered solution, and may either be an application-layer protocol
> >in itself, or may be used in conjunction with another transfer
> >protocol. For example, the default binding is HTTP;
> 
> I disagree that XML Protocol may be an application-layer protocol. 
> We are chartered not to define any application semantics of our
> own, and though an out exists for us to do so, I don't think it
> should be assuming that we will take it.  For the RPC use of XP, it
> too defines no application semantics; that can only be done by
> agreeing on some APIs or some parameters.

I was speaking in the context of the OSI stack; apologies, will
clarify. Time to add more layers ;)


> >For the purposes of XML Protocol, it may be most useful to
> >disregard the extremes; exclusively low-level (such as physical
> >and network transport) and high-level (such as business logic)
> >intermediaries do not add substantial meaning to the XML Protocol
> >model.
> 
> I'm unclear what you mean here, wrt the "business logic" comment. 
> Can XP intermediaries not be used for composing processors of
> business logic?

I think Gudge's discussion re: intermediaries and addressability
is relevent here. A device which accepts XML Protocol messages,
performs some processing and, as a part of that, makes new XML
Protocol requests to other services is certainly an intermediary, but
"above" the XP layer. There is processing done, but this is triggered
by something outside the message, in the intermediary itself.

Hopefully, solidifying the terminology for the different layers and
their functions will help clarify this. I'll need to rewrite the
document to reflect this, as I didn't appreciate the distinction at
the time.


> >The ability to target XML Protocol Modules to specific
> >intermediaries brings about the need to find a way to nominate
> >them. Additionally, the status and error reporting requirements
> >need a mechanism to identify the intermediary which generates such
> >a message. There is no URI scheme specified for identifying an
> >intermediary; schemes such as HTTP are meant to identify
> >endpoints.
> 
> I agree, but it's a bit early to say that.

Sorry, what's too early here?


> >XML Protocol's Modules offer an excellent opportunity to
> >standardize common intermediary functions.
> 
> I suggest that a far more suitable place would be in the
> application protocol that XP will be used on top of; that's why we
> call them application protocols 8-).
> 
> We really need to focus on reusing the established application
> semantics that exist in deployed application protocols today.  For
> new semantics, people will have a choice; extend the application
> protocol via its documented extension mechanisms, or use the XML
> envelope of XP.  Each has its pros and cons.

I agree there will be both, but I wonder how successfully we'll be
able to map functions like routing, caching, encryption,
authentication etc. from disparately developed protocols into an XML
envelope, and make it beneficial to do so re: overhead, etc. There
are standard frameworks for these things (SASL pops to mind), but
they aren't really used evenly across protocols that may be
interesting to us (HTTP, SMTP, BEEP, etc).

Additionally, services like caching can be much more powerful if
they're in the envelope, rather than relegated to the transport.


> >Some XML Protocol applications may wish to make caching possible
> >for latency, bandwidth use or other gains in efficiency. To enable
> >this, it should be possible to assign cacheability in a variety of
> >circumstances.
> 
> How can a cache model be defined at this layer when there are no
> application semantics?  How messages are cached depends entirely on
> how they are transferred.  I'd like to be shown otherwise, but I
> cannot see how a useful caching model could be created
> independantly of any specific application protocol.
> 
> I believe the furthest that we could and should go in this space,
> is to perhaps define some metadata that can be used to describe the
> application-neutral cacheability properties of XP messages.  Then,
> in the protocol bindings, we could define how they bind to the
> cache models of the application protocols.

I assume the 'we' you speak of is the W3C, rather than this WG, which
has a very specific and limited charter.

Why is cacheability application-specific? Certainly its description
is, but I believe that very useful, generic cacheability semantics
can be defined for messages, or message elements, which can be
described in blocks or elsewhere.


> [re message integrity]
> >While these mechanisms have been discussed as necessary extensions
> >to define in XML Protocol, the possibility of modification of any
> >XML Protocol message brings the need to use them for all messages,
> >not only those which contain sensitive content.
> 
> I don't see the need to use them for all messages.  Plus, some
> application protocols that have identified a need for maintaining
> message integrity, support it already.  So I'm not sure what
> requiring it at the XP level would achieve (though I could see
> integrity at the body level being important, and not the envelope).

All is perhaps hyperbole, but for some transports, particularly HTTP,
there is a very real possibility of the message being intercepted,
either with malicious intent or not. The majority of the scenarios
I'm thinking of is where an HTTP intermediary trys to 'helpfully'
provide a 'value-added' service, which corrupts (bad) or changes the
semantics of (worse) the message.

Cheers, 


-- 
Mark Nottingham, Research Scientist
Akamai Technologies (San Mateo, CA)
Received on Thursday, 8 February 2001 20:53:58 UTC