RE: update to Issue #82 from Mark Jones on 2001-04-26 (xml-dist-app@w3.org from April 2001)

From: Mark Jones <jones@research.att.com>
Date: Wed, 25 Apr 2001 23:09:50 -0400 (EDT)
To: frystyk@microsoft.com
Cc: xml-dist-app@w3.org
Message-Id: <200104260309.XAA07404@glad.research.att.com>
	> From frystyk@microsoft.com Wed Apr 25 21:54 EDT 2001
	> Delivered-To: jones@research.att.com
	> From: "Henrik Frystyk Nielsen" <frystyk@microsoft.com>
	> To: "'Mark Jones'" <jones@research.att.com>
	> Cc: <xml-dist-app@w3.org>
	> Subject: RE: update to Issue #82
	> Date: Wed, 25 Apr 2001 18:54:15 -0700
	> MIME-Version: 1.0
	> Content-Transfer-Encoding: 7bit
	> X-Priority: 3 (Normal)
	> X-MSMail-Priority: Normal
	> Importance: Normal
	> X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
	> X-OriginalArrivalTime: 26 Apr 2001 01:54:15.0929 (UTC) FILETIME=[CC92BE90:01C0CDF3]


	> >Requirements R307 and R308 call for simplicity, which would 
	> >argue for eliminating a distinction between header and body 
	> >blocks if one is not warranted.  This is particularly true if 
	> >that distinction poses a problem for resource-constrained 
	> >devices, which requirement R309 reminds us to consider and 
	> >which are implicated in usage scenario S21. Furthermore, the 
	> >Abstract Model currently makes no semantic distinction between 
	> >header and body blocks.

	> I think here it is important to ask the question of simplicity with
	> respect to what? The act of removing things do not necessarily make
	> things simpler - removing traffic lane separators do not tend to make
	> traffic conditions simpler.

Simpler I think refers primarily to semantic distinctions.  If
intermediaries had a significantly different processing semantics than
final destinations, then it would make less sense to attempt to unify
things.  The processing algorithms, however, are basically identical,
with the final destination in some sense being the degenerate case
since it doesn't insert blocks and forward again; otherwise, it
locates and processes blocks targeted at itself much like any
intermediary.  This also seems appropriate since one man's
intermediary may be another man's final destination so to speak.

The balance to simplicity is efficiency.  It looks like we have
a couple of (competing?) considerations.  

1) You'd like memory-limited devices to be able to generate various
   blocks without having to do buffering.  Actually, in the general
   case, you would like all of the handlers on the sender that
   contribute blocks to do so without undue worry about buffering
   issues.  If messages could be incrementally constructed (and transported)
   without regard for the order in which independent handlers inserted
   blocks this would be a big win.

2) You'd like intermediaries to be able to find, parse and process
   blocks intended for them as efficiently as possible. (And to
   avoid buffering, etc. insofar as possible.)

Syntactic simplicity is further down my list.  Eliminating the
header/body distinction does have syntactic simplicity on its side and
it happens to resonate with efficiency (1), but not necessarily
efficiency (2).  Maybe there is another syntax that works even better
than either strict header/body or mixed header/body blocks that
facilitates both efficiency (1) and (2).  In the design phase, we should
think about the issue and see what we can come up with.

	> The scenario that was brought forward in [1] and which is to be added to
	> issue 82 is very similar (signing a block) to that of issue 25 [3] which
	> was the reason for my question. This is a specific scenario that I think
	> is useful to address - we had the same problem in HTTP and had to
	> introduce trailers for supporting this. 

It sounds like trailers are semantically another kind of header,
simply positioned after the body.  Why would this be preferably to
simply having a sequence of blocks?

	> >The primary support for maintaining a distinction comes from 
	> >requirement
	> >
	> >R802 and existing practice in SOAP 1.1.  A workaround exists 
	> >for memory-limited handlers/processors that need to produce 
	> >intermingled header and body blocks. The body blocks can 
	> >instead be produced as header blocks and then referenced in a 
	> >later small body block.  This has the drawback that the large 
	> >body blocks in the header may not be easily skipped by 
	> >intermediaries, which re-raises the R802 problem again. 

	> Note that removing the separation between header and body in fact
	> doesn't solve the problem presented in the scenario - in order for it to
	> be useful the receiver has to know up front that something is following
	> at the end.

Removing the separation simply allows the memory-limited device to
output the body block and then output some other (header) block that
functionally depends on the body (e.g., checksum or some other
arbitrary computation) without buffering.  It would otherwise have
to buffer the body block, compute and write the header block and then
write the body block.  This is efficiency (1) above.

The problem it does not address is efficiency (2) above.  Suppose an
intermediary just wants to get at the checksum block.  How does it
efficiently skip the body block if it syntactically follows it?  (The
same thing applies to your proposed forward-reference mechanism
below.)

It should also be noted that nothing in the framework guarantees that
body blocks are always bigger than header blocks.  I can also imagine
situations in which you have two large independent header blocks, each
processed by a different intermediary.  No fixed ordering of the
blocks will be efficient for all routing paths.

	> This is why I think issue 25 and 82 are in fact quite closely related.

	> The discussion in issue 25 talks about that it is possible to have
	> things after the body in SOAP but that SOAP is currently silent on how
	> to use this other than saying that these things do not take part in the
	> processing model [4].

	> I can think of a mechanism where we have a module with a block that
	> points to another block and says what is in the referenced block. The
	> referenced block can then be at the end. This seems to work within the
	> current model and allows to support scenario [1]. Are there any
	> downsides to this?

An intermediary interested only in this final block would still have
to parse through the body block in the envelope to get to the final
block.  How would this be any better than just permitting the header
block to follow the body block (without the forward reference hack)?
I guess it would allow intermediaries that weren't interested in
following any forward references to quit when they hit the body, but
it doesn't always help and it introduces some additional complication.

Something that I have never quite understood is how burdensome it is
for an intermediary to find targeted blocks and to construct the
forwarded message.  It seems to me that a block that is not targeted
at the current node can be streamed on through as a part of the
forwarded message; the processor basically just has to find the
matching angle brackets that terminate the block.  This shouldn't be
hard even for a memory-limited device.  A block that is targeted at
the current node might be able to be incrementally processed by a
memory-limited processor as it is parsed without keeping it all in
memory as well.  So with the right implementation, I'm not sure why
skipping parts of the message is important anyway -- as an
intermediary you have to forward blocks not targeted at yourself, so
you still have to handle all of the message bytes one way or the other.

--mark

	> Henrik

	> [1]
	> http://lists.w3.org/Archives/Member/w3c-xml-protocol-wg/2001Apr/0160.htm
	> l
	> [2] http://www.w3.org/2000/xp/Group/xmlp-issues#x82
	> [3] http://www.w3.org/2000/xp/Group/xmlp-issues#x25
	> [4] http://www.w3.org/TR/SOAP/#_Toc478383494
Received on Wednesday, 25 April 2001 23:10:04 UTC