Re: Reliable HTTP comments by Mark Nottingham from Richard P. King (8-863-7245) on 2001-07-17 (www-ws@w3.org from July 2001)

From: Richard P. King (8-863-7245) <RPK@watson.ibm.com>
Date: Tue, 17 Jul 01 17:29:16 EDT
To: www-ws@w3.org
Message-Id: <200107172202.SAA10454@sp1n189at0.watson.ibm.com>
In-Reply-To: <20010716150054.A2439@akamai.com>
References: <20010716150054.A2439@akamai.com>

Mark,

I hope you won't mind if I jump in here and make a few comments in
response to your note.  Please forgive my editting of your note; to keep
the discussion clear, I will remove those parts to which I am not
responding in detail, although let me admit that some parts of the
specification are confusing and I appreciate your taking the time to
identify problems you had with it.

> Date: Mon, 16 Jul 2001 15:01:04 -0700
> From: Mark Nottingham <mnot@akamai.com>
> Message-ID: <20010716150054.A2439@akamai.com>
> Subject: Re: Reliable HTTP
>
> Hi John,
>
> Generally, I question the choice of HTTP as a base for reliable
> messaging. You seem to spend a fair amount of effort avoiding HTTP
> mechanisms (by defining a separate set of headers, your own chunking
> mechanism, etc.), while not really getting too many benefits from it.
> BEEP seems like a much more natural choice in this respect.
>

One reason for deciding to layer HTTPR on top of HTTP is the current
internet culture in the configuration of servers and firewalls.  By
defining HTTPR as it is, messaging traffic can be enabled without changes
to existing firewall configurations that only allow port 80 and port 443
traffic to enter their domains.  Once they reach the approved list of
servers, HTTPR requests will, almost certainly, be received by a web
server that only accepts HTTP requests.  HTTPR requests are valid HTTP
POST requests.  They can then be handed, by the web server, to a server
module (Apache) or a servlet (Tomcat), based on the URI in the POST and
the corresponding configuration of the web server.  The fact that the
HTTP message is not in the form of HTML is not of concern to the web
server, which just ensures that the request is valid HTTP, nor to the
server module or servlet, as the case may be, which understands the
nature of HTTPR requests.

The reason for duplication of the chunking is that there are two
different reasons for doing chunking.  HTTP requires chunking if the
size of the HTTP payload is not known up front.  When a batch of
messages is being transmitted, the sender may not decide what messages
to include before sending out the first message.  It may, for example,
take messages off of a queue of some kind, and it may not be efficient to
take all of them off and thereby learn their sizes before starting to
send any of them.  HTTP/1.1 chunking is very convenient, allowing the
sender to build the batch at will until it decides it's time to stop.
However, each message in an HTTPR message batch includes descriptive
information, including its size.  This information may not be available
if, for example, the HTTPR service layer being used by a message sender
provides an API such that it can be handed a stream object of some kind
that will create the message on the fly as the data is streamed out over
the network.  (The nature and design of such a service layer are beyond
the scope of our specification, but we would not want to preclude such a
feature.)  Since the HTTP payload is already being chunked, that would be
sufficient to get the data over there, but we would still need some kind
of marker showing where the stream of message data ended.  It was
expedient to simply allow another layer of chunking, using the
higher-level chunking to mark the pieces, and the end, of that one
message.

>
> From the primer:
>
>   To this end we recommend that a protocol layer be added to HTTP
>   that we call HTTPR and similarly to HTTPS that we call HTTPSR.
>
> Most HTTP implementations (e.g., intermediaries) won't know how to
> dispatch these new protocol schemes. If the protocol conforms to the
> requirements of HTTP, I'd suggest keeping HTTP and signalling the
> presence of reliability semantics through another mechanism (perhaps
> content-type?). Also, IIRC it's not permissable to have two protocol
> schemes (HTTP and HTTPR) that both use the same default port (80).
>

Since HTTPR is valid HTTP, all intermediaries should pass it along just
fine.  They will see a POST with a payload, whose contents they ignore
other than to pass it along to the server, and in the response they will
see a payload, whose contents they will also ignore other than to pass
it along to the client.

> From the spec;
>
>   In one transmission of a batch of messages from one agent to
>   another over httpr, the sink agent will indeed see the messages
>   arrive in the same order as the source agent sent them because they
>   are flowing over a single TCP/IP connection and are therefore
>   reliably ordered.
>
> If you're speaking about HTTP messages, this assumption cannot be
> made; HTTP intermediaries are allowed to use multiple upstream
> connections and keep one downstream connection open. Additionally,
> those multiple upstream connections may be to different parent
> proxies.
>
> HTTP is explicitly a stateless protocol; you cannot make any
> assumptions about the ordering of messages on the wire.
>

Messages within a single HTTPR batch will go in order because they are
in a single HTTP payload.  In the absence of HTTPR pipelining, messages
on a single channel will go in order because a second batch may not be
sent until the first has been acknowledged.  When doing HTTPR pipelining,
messages will be in order because the pipelining is done on a single HTTP
session that must obey RFC 2616 as regards the order of requests and
responses (RFC 2616 section 8.1.2.2 Pipelining).

> --
> Mark Nottingham, Research Scientist
> Akamai Technologies (San Mateo, CA USA)
>

I hope this has made things a little clearer.

Richard P. King, programmer
IBM TJ Watson Research Center
Received on Tuesday, 17 July 2001 18:03:17 UTC