Re: XMLP-UC-6 reformulation - simple streaming use case from noah_mendelsohn@us.ibm.com on 2003-09-23 (xml-dist-app@w3.org from September 2003)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 23 Sep 2003 17:30:46 -0400
To: "John J. Barton" <John_Barton@hpl.hp.com>
Cc: Jacek Kopecky <jacek.kopecky@systinet.com>, Mark Nottingham <mark.nottingham@bea.com>, XMLP Dist App <xml-dist-app@w3.org>
Message-ID: <OF55E378B0.72E34907-ON85256DAA.00625D68@lotus.com>
John, let me try and respond to the various sections of your note:

John Barton writes:

> Noah,
> 
> Unfortunately I am once again confused by the use of
> the word "streaming".  Maybe I missed a clarification
> sometime back?  Mark's formulation might be incomplete
> but at least I understand its terms ;-).

I use "streaming"  to refer to the broad range of scenarios in which a 
sender and/or receiver needs to prepare or process the message 
incrementally.  In other words, any alternative to the situation in which 
the entire message can be buffered both before sending and prior to start 
of processing following receipt.  In the general case for large messages, 
such streaming allows for overlap of sender and receiver processing of the 
same message, though such overlap is not required and may only be achived 
in some cases.  While there are probably more formal definitions out 
there, I think this is consistent with general usage in the industry.

So, my use of the word involves a potentially broad range of use cases 
including but not limited to situations in which the XML SOAP envelope 
itself is very large, some sort of attachment is very large, where there 
is more than one large attachment (e.g. a video and an audio stream to be 
sent in parallel as generated, though I am not pushing hard on issues of 
isochrony here), situations such as satellite transmission in which there 
is value in overlapping processing at sender and receiver, etc.  I believe 
that analogs of each of these scenarios have proven crucial at one time or 
another with earlier messaging systems.

> If I look around eg W3C almost all the uses of the term
> "streaming" are for audio and video.  I did see this
> however: 
> 
> > SteveS: not having to download the whole package
> before unpacking part of it--streaming.  Is that the
> meaning of "streaming" in this context?  If so, then it
> is exactly what we need to make some of the use cases
> feasible.

I was not referring to any particular W3C characterization of streaming, 
but to the broad range of behaviors that people may at least think they 
want to see for SOAP in one context or another.  We don't have to support 
them all, but I think we have to consider many and choose carefully.

 
> I also found your second paragraph confusing. Let me
> try to pick this apart:
> 
>  >* The HTTP binding provided with MTOM either 
>  > (a) need not be optimized for
>  >streaming
> 
> This reads like a non-requirement to me: why list the
> things the binding is not optimized for? Well maybe the
> OR case is the one I want...

Mea culpa, it is of course a non-requirement.  What I meant was:  let me 
offer two alternative formulations for consideration by the workgroup.

(a) While we may agree on the desirability of havning an abstract model 
that facilitates streaming when the binding wishes to do so, let's keep 
our initial MTOM HTTP binding simple.  It's not clear to me that we 
understand the requirements well enough for streaming to choose well, so 
let's keep it simple, as was done for SOAP 1.2.  In other words, let's not 
require ourselves to produce a streaming binding in association with this 
version of MTOM.  Of course, MTOM like SOAP allows you to create your own 
bindings, and those might indeed facilitate streaming.

That's option (a) for consideration.  The alternative I proposed was (b):
 
>  > or ( b) SHOULD provide for accessibity to 
>  > non-optimzed envelope information ahead 
>  > of the serializations of large binary objects
> 
> Well I think I understand this one: you are going to
> tell me the size of stuff before you send it: I like
> it.

No, that's not what it said, though that is indeed an interesting design 
point for yet another set of use cases.  What this one said is:  make sure 
that the non-optimized >envelope< comes first.  I.e. MTOM allows you to 
optimize parts of the envelope by taking them out of line and replacing 
them with xbinc:include.  I was informally referring to the result of that 
as the "unoptimized" (part of) the envelope.  In other words, you get the 
complete <soap:envelope> and all its children before any of the binary 
parts.  That represents a form of streaming, insofar as it allows both 
sender and receiver to deal with the envelope before sending/receiving the 
so-called attachments.

FWIW: requiring a length at the head of messsage segment tends to move 
streaming headaches from the receiver to the sender, at least in the case 
where the sender itself does not know the length of the data in advance. I 
think there are 2 or 3 use cases hidden in this area:  you want to make 
life easy for the receiver, and the sender happens to know the lenght; you 
want to make life easy for the receiver even if the sender has to buffer a 
gigabyte to determine the length;  you want to make life easy for the 
sender, so you make no requirement to send a length ahead of the data. 
Again, I think that all of these are legitimate design points for one use 
case or another.  Indeed, it's the range of such requirements that 
suggests to me that we should go slow on adding streaming features.
 
>  >and SHOULD
>  >further  provide for streaming in the case that only one large object 
has
>  >been optimized
> 
> Huh? Why one?  and anyway what is streaming? 

Well, this was an attempt to find an 80/20 point for those who have, say, 
a large XRay file as a GIF or JPEG, and want to stream that as well as the 
envelope.  By stream I mean, be able to send out some of the bytes of the 
XRay before all of them are available at the sender and/or to be able to 
begin processing of the first few raster lines at the receiver before the 
whole thing is received (and perhaps before the sender has even sent the 
tail.)  Considre, for example, the case where some scanning sensor is 
sending out the raster lines for the XRay as they become available, and we 
are sending them out in a SOAP message in parallel with the scanning of 
additional lines.

Why one object only?  Because I can see straightforward implementations of 
that.  If there are two xrays streaming in parallel off two scanners 
(stereo image?),  and you don't want to wait for all of the first one 
before you can make progress on the second, then you are in the business 
of interleaving them.  That's going to be important for some use cases 
someday, but I was making the suggestion that interleaving might not make 
an 80/20 cut for a SOAP binding in the next few months.

>If you
> tell me enough information ahead of the bits, then
> either I can accept your TCP/IP packets or refuse them.
> Given that we are in HTTP these are the only two things
> I can do right?  I'd rather read something like:

I think it depends on the level you're thinking about.  At some level, all 
of TCP/IP streams (in the sense I mean) because it comes in one packet at 
a time, and you can always try to finish with one before accepting (or 
sending) the next.  The question is whether that's realistic at the next 
level up.  To be perfectly rigorous, you can't for example process the 
start of a SOAP envelope without seeing the end, because you don't even 
know whether it's well-formed until you see the end tag for 
</soap:envelope>.  If that doesn't show up in the right place, you've got 
no Infoset, and no Envelope, therefore "no SOAP" (pun intended.)  XML 
doesn't stream, in this sense, and SOAP uses XML (modulo the permission to 
use optimistic concurrency and roll back all side effects once you 
discover that the envelope is poorly formed.)  Of course, many 
implementations will start work early, and will indeed roll back when the 
message proves to be not well formed.  Still, I think you'd be making a 
mistake to do a database commit based on a SOAP message until you'd seen 
the end tags.
 
Similarly, if I want SOAP to be robust enough to make progress on 2 or 3 
large streaming attachments to the same message in parallel, then I can't 
just argue at the IP level.  I've got to look to Multipart MIME, DIME, or 
some level that will allow me to express the interleaving of those 
streams.  I think that's a very important use case for someday, but I'm 
proposing we not "go there" for now.

> 
> 
> ______________________________________________________
> John J. Barton          email:  John_Barton@hpl.hp.com
> http://www.hpl.hp.com/personal/John_Barton/index.htm
> MS 1U-17  Hewlett-Packard Labs
> 1501 Page Mill Road              phone: (650)-236-2888
> Palo Alto CA  94304-1126         FAX:   (650)-857-5100

Thanks for your patience.  Hope this is helpful.

Noah

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Tuesday, 23 September 2003 17:38:04 UTC