Re: XML Protocol and Axioms of Web Architecture from Hugo Haas on 2001-08-21 (www-archive@w3.org from August 2001)

Forwarded message 1

From: Mark Baker <distobj@acm.org>
Date: Mon, 20 Aug 2001 21:01:31 -0400 (EDT)
Subject: Fw: Re: XML Protocol and Axioms of Web Architecture (fwd)
To: hugo@w3.org
Message-Id: <200108210101.VAA09768@markbaker.ca>
Hey Hugo.  I'll just respond for now since you asked for this in a timely
fashion.  Feel free to forward/archive as you need.

> First, and to answer your question, I relayed this concern[10] and
> this fell off my radar screen before it made it to the issues list or
> was discussed enough.
> 
> I did a bunch of reading from the FoRK archives[2], xml-dist-app[3]
> and a very interesting text by Mark Baker[4].
> 
> Note that the discussion below is based on the current HTTP binding of
> the SOAP Version 1.2 specification[5], i.e. an XML document as the
> entity carried by an HTTP POST request.
> 
> Abusing the Web architecture is easy and frequent. The same way Web
> designers use HTTP POST instead of GET in HTML forms for everything
> (e.g. what is the most played song on radio stations in Dallas, TX
> over the last week), you can use SOAP in the POST request to do the
> same.
> 
> The drawbacks are two-fold: first, POST is used whereas there is no
> side-effect to the query (I am not committing or changing any
> information), and second you are likely to talk to a generic (e.g.
> <http://example.com/mySoapProcessor>) which has nothing to do with a
> resource representing the list of most played songs on Texan radio
> stations.
> 
> In simple cases, such as the ones described by Mark (i.e. direct
> communication between the client and the server), SOAP only makes
> sense over POST, using the semantics of POST, and I think that it is
> up to programmers and system designers to use the Web as it was meant
> to be used.
> 
> However, there is a gain in using a SOAP request (in an HTTP POST
> request, as currently stands) instead of a simple GET: you get a
> processing model, including the fact that you can use intermediaries;
> e.g. a third party (key escrow) could authenticate my request in some
> way and add some information along with my request and send it to the
> final recipient afterwards.

You also get this with HTTP.  It has a processing model.  SOAP/HTTP
just extends it in some useful ways.

> The problem with this is that such an HTTP binding forbids HTTP
> intermediary caching.

Just to be clear, because it may not be from the context of this
statement; this is only a problem if you're not obeying POST
semantics.  If you are, then no caching is expected.

> HTTP POST requests are not cacheable. Moreover, it does indeed
> challenge the architecture of the Web[8]. I originally thought that it
> was meaning that SOAP Version 1.2 did not meet requirement R803[7],
> but I don't think that this is the case.

On the other hand, HTTP itself includes features that allow for its
graceful demise (e.g. no-cache, upgrade) via tunneling a new protocol
over it.

Really, when you're tunneling over HTTP POST, you've already given
up on Web architecture - you're doing something entirely different.
So from that POV, it doesn't really matter so much as long as that
use can be identified and isolated from other traffic.

> HTTP GET requests do not have an entity body. POST requests do.
> Nevertheless, you could encode your SOAP request in some way in your
> GET request, by using, say a 20k-long URI; [6] suggests however that
> current HTTP implementations will not really like that.

This is my biggest issue with SOAP.  Defining headers in XML is fine
and dandy, but many of those headers will be useful over GET as well.
How do we represent those without a body?

> Moreover, if you want to use an intermediary, you would still have to
> to use URIs in a weird way for routing (i.e. the client will not send
> a request to the final recipient):
> 
>    client
>    -> (first hop to the key escrow)
>    http://key.escrow.example/key?user=hh&target=<put_encoded_soap_request_here>
>    -> (second hop to the server hosting of the list of top songs)
>    http://example.com/top%20songs?request=<put_encoded_soap_request_here>
>    -> (response sent to the client)

Not necessarily.  The first hop could be an HTTP proxy which could
understand some HTTP headers that specify a route.

> In the end, I wonder if this "tunnelling" of SOAP in HTTP is really
> worse than using GET requests (see also Mark's proposal of two
> bindings[9]).

Without a doubt, yes, it's worse.

> I think that the advent of SOAP makes use of the Web in
> a new way, which might be fine as long as 1) we are aware of it 2)
> this is the way to go 3) we clearly document it.

And 4) make sure it's identifiable on the network so that we can
route/manage/filter it.

> [ I will stop here for tonight, because I don't think that I will
>   solve the problem tonight... ]
> 
> I am proposing (in a separate email, sent to xml-dist-app) to open two
> issues refering to this email:
> 
> 1/ The HTTP binding of the SOAP Version 1.2 specification[5] does
>    preclude caching of information at the HTTP level in case of
>    requests having the semantics of an HTTP GET request.

I'm not sure of the value of bringing that up.  I expect the response
you'd get from the tunneling people would be "ok, so what?". 8-)

In the end, we're not going to stop people who want to tunnel from
tunneling.  All we can do, as mentioned, is identify it.  That's what
my two-binding approach proposed (and Henrik's proposal did the
same two, in an unbounded # of bindings kinda way).

> 2/ SOAP Version 1.2 over HTTP[5], in its current form, misuses the Web
>    architecture. There is a risk of abuse of the HTTP POST method over
>    a single URI (e.g.
>    <http://example.com/IWillDoAnyProcessingYouNeed>).

I disagree with that statement, because there exists at least one use
of SOAP that *doesn't* abuse the architecture of the Web.  But for RPC
and for other tunnel uses of HTTP, it is true.

I believe that the best we can do here is;

- make sure that there exists a canonical and authoritative means
of identifying a tunnelled use of HTTP
- document that this use of HTTP abuses the architecture of the Web

> I would appreciate some input before I do so, so that we don't open
> unnecessary issues.

Sorry for the delay.

MB