Re: XMLP-UC-6 reformulation - simple streaming use case from noah_mendelsohn@us.ibm.com on 2003-09-28 (xml-dist-app@w3.org from September 2003)

From: <noah_mendelsohn@us.ibm.com>
Date: Sun, 28 Sep 2003 12:57:53 -0400
To: John_Barton@hpl.hp.com
Cc: Jacek Kopecky <jacek.kopecky@systinet.com>, Mark Nottingham <mark.nottingham@bea.com>, XMLP Dist App <xml-dist-app@w3.org>, xml-dist-app-request@w3.org
Message-ID: <OF84F7520A.61AA09D0-ON85256DAD.0079BA0D@lotus.com>
John Barton writes:

>>  Thanks for your detailed and thoughtful reply. 
>> I'll rearrange what you said and add some 
>> stuff...hopefully it will help ;-)

Thank you.  Yes, I think it does help.  While I don't necessarily agree 
with (or in a few cases understand) every nuance of what you've written 
below, I think it's overall consistent with the sort of analysis I think 
we have to do to justify any support of streaming...and indeed, that was 
my main point.  There are lots of potentially important use cases, but 
plenty of users ready to say "surely this is simple:  if you just bake in 
support for my use case we'll be all set."  I think we should either skip 
streaming in this round as not making an 80/20 cut, or we should put some 
energy into getting concensus on the range of use cases likely to be of 
interest over time.  I think your note below very much contributes to that 
discussion, as I hope mine did. 
Having done such a use case analysis, I think we can decide how much if 
any support to put into each of the three layers of MTOM.  As I said 
earlier, there may be value in making sure that the abstract model does as 
little as practical to preclude streaming of various sorts, even if we 
decide that our initial binding supports a smaller set of scenarios (if 
any). 

Thank you!

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







"John J. Barton" <John_Barton@hpl.hp.com>
Sent by: xml-dist-app-request@w3.org
09/25/2003 12:54 PM

 
        To:     Noah Mendelsohn/Cambridge/IBM@Lotus
        cc:     Jacek Kopecky <jacek.kopecky@systinet.com>, Mark Nottingham 
<mark.nottingham@bea.com>, XMLP Dist App <xml-dist-app@w3.org>
        Subject:        Re: XMLP-UC-6 reformulation - simple streaming use case



Noah,

   Thanks for your detailed and thoughtful reply.  I'll rearrange what
you said and add some stuff...hopefully it will help ;-)

   There seems to be four related issues:
      1) senders that can, can't, or won't count bytes.
      2) 0, 1, or more than one binary attachment.
      3) incremental vs batch processing of the message
      4) spatial relationship between SOAP and the attachments or
         among the latter.
Can we understand how these interact?

Let's look at counting bytes and number of attachments by example:
   PrintAPhoto: 1 binary data, can count bytes.
   StereoXRay: Multiple binary data, can count bytes.
   LazyPrintServer: Multiple binary data, won't count bytes.
   Internet Radio: Exactly one binary data, can't count bytes.
   Internet Multimedia: >1 binary data, can't count bytes.

Then let's ask "Can we process these incrementally"?
   PrintAPhoto: yes, if I know before I get the image bits where they
        need to be rendered.
   StereoXRay: yes, if I know before I get the image bits which database
        will receive them.
   LazyPrintServer: no I cannot decide if the job is possible until it 
dies.
   Internet Radio: yes, if I know before I get the audio that I am going
       to decode frames and emit them.
   Internet Multimedia: yes, as for radio if the binary is interleaved.

 From these examples we observe that incremental processing
depends on message structure: if we put the processing commands
and sizes in front and allow the server to interleave content, we can
cover a lot of ground.  I believe that the first bit is what Noah means
by putting the envelop first. Once we do that, interleaving content is
easy.

There are two more issues that complicate this picture:
    5) Digital signatures,
    6) embedded XML + validation.
Obviously any operation that must be performed over the entire
message before processing prevents incremental processing.
In searching out the 80/20 spot, I believe we should avoid solutions
that insist on whole-message preprocessing.

John.


At 05:30 PM 9/23/2003 -0400, noah_mendelsohn@us.ibm.com wrote:

>John, let me try and respond to the various sections of your note:
>
>John Barton writes:
>
> > Noah,
> >
> > Unfortunately I am once again confused by the use of
> > the word "streaming".  Maybe I missed a clarification
> > sometime back?  Mark's formulation might be incomplete
> > but at least I understand its terms ;-).
>
>I use "streaming"  to refer to the broad range of scenarios in which a
>sender and/or receiver needs to prepare or process the message
>incrementally.  In other words, any alternative to the situation in which
>the entire message can be buffered both before sending and prior to start
>of processing following receipt.  In the general case for large messages,
>such streaming allows for overlap of sender and receiver processing of 
the
>same message, though such overlap is not required and may only be achived
>in some cases.  While there are probably more formal definitions out
>there, I think this is consistent with general usage in the industry.
>
>So, my use of the word involves a potentially broad range of use cases
>including but not limited to situations in which the XML SOAP envelope
>itself is very large, some sort of attachment is very large, where there
>is more than one large attachment (e.g. a video and an audio stream to be
>sent in parallel as generated, though I am not pushing hard on issues of
>isochrony here), situations such as satellite transmission in which there
>is value in overlapping processing at sender and receiver, etc.  I 
believe
>that analogs of each of these scenarios have proven crucial at one time 
or
>another with earlier messaging systems.
>
> > If I look around eg W3C almost all the uses of the term
> > "streaming" are for audio and video.  I did see this
> > however:
> >
> > > SteveS: not having to download the whole package
> > before unpacking part of it--streaming.  Is that the
> > meaning of "streaming" in this context?  If so, then it
> > is exactly what we need to make some of the use cases
> > feasible.
>
>I was not referring to any particular W3C characterization of streaming,
>but to the broad range of behaviors that people may at least think they
>want to see for SOAP in one context or another.  We don't have to support
>them all, but I think we have to consider many and choose carefully.
>
>
> > I also found your second paragraph confusing. Let me
> > try to pick this apart:
> >
> >  >* The HTTP binding provided with MTOM either
> >  > (a) need not be optimized for
> >  >streaming
> >
> > This reads like a non-requirement to me: why list the
> > things the binding is not optimized for? Well maybe the
> > OR case is the one I want...
>
>Mea culpa, it is of course a non-requirement.  What I meant was:  let me
>offer two alternative formulations for consideration by the workgroup.
>
>(a) While we may agree on the desirability of havning an abstract model
>that facilitates streaming when the binding wishes to do so, let's keep
>our initial MTOM HTTP binding simple.  It's not clear to me that we
>understand the requirements well enough for streaming to choose well, so
>let's keep it simple, as was done for SOAP 1.2.  In other words, let's 
not
>require ourselves to produce a streaming binding in association with this
>version of MTOM.  Of course, MTOM like SOAP allows you to create your own
>bindings, and those might indeed facilitate streaming.
>
>That's option (a) for consideration.  The alternative I proposed was (b):
>
> >  > or ( b) SHOULD provide for accessibity to
> >  > non-optimzed envelope information ahead
> >  > of the serializations of large binary objects
> >
> > Well I think I understand this one: you are going to
> > tell me the size of stuff before you send it: I like
> > it.
>
>No, that's not what it said, though that is indeed an interesting design
>point for yet another set of use cases.  What this one said is:  make 
sure
>that the non-optimized >envelope< comes first.  I.e. MTOM allows you to
>optimize parts of the envelope by taking them out of line and replacing
>them with xbinc:include.  I was informally referring to the result of 
that
>as the "unoptimized" (part of) the envelope.  In other words, you get the
>complete <soap:envelope> and all its children before any of the binary
>parts.  That represents a form of streaming, insofar as it allows both
>sender and receiver to deal with the envelope before sending/receiving 
the
>so-called attachments.
>
>FWIW: requiring a length at the head of messsage segment tends to move
>streaming headaches from the receiver to the sender, at least in the case
>where the sender itself does not know the length of the data in advance. 
I
>think there are 2 or 3 use cases hidden in this area:  you want to make
>life easy for the receiver, and the sender happens to know the lenght; 
you
>want to make life easy for the receiver even if the sender has to buffer 
a
>gigabyte to determine the length;  you want to make life easy for the
>sender, so you make no requirement to send a length ahead of the data.
>Again, I think that all of these are legitimate design points for one use
>case or another.  Indeed, it's the range of such requirements that
>suggests to me that we should go slow on adding streaming features.
>
> >  >and SHOULD
> >  >further  provide for streaming in the case that only one large 
object
>has
> >  >been optimized
> >
> > Huh? Why one?  and anyway what is streaming?
>
>Well, this was an attempt to find an 80/20 point for those who have, say,
>a large XRay file as a GIF or JPEG, and want to stream that as well as 
the
>envelope.  By stream I mean, be able to send out some of the bytes of the
>XRay before all of them are available at the sender and/or to be able to
>begin processing of the first few raster lines at the receiver before the
>whole thing is received (and perhaps before the sender has even sent the
>tail.)  Considre, for example, the case where some scanning sensor is
>sending out the raster lines for the XRay as they become available, and 
we
>are sending them out in a SOAP message in parallel with the scanning of
>additional lines.
>
>Why one object only?  Because I can see straightforward implementations 
of
>that.  If there are two xrays streaming in parallel off two scanners
>(stereo image?),  and you don't want to wait for all of the first one
>before you can make progress on the second, then you are in the business
>of interleaving them.  That's going to be important for some use cases
>someday, but I was making the suggestion that interleaving might not make
>an 80/20 cut for a SOAP binding in the next few months.
>
> >If you
> > tell me enough information ahead of the bits, then
> > either I can accept your TCP/IP packets or refuse them.
> > Given that we are in HTTP these are the only two things
> > I can do right?  I'd rather read something like:
>
>I think it depends on the level you're thinking about.  At some level, 
all
>of TCP/IP streams (in the sense I mean) because it comes in one packet at
>a time, and you can always try to finish with one before accepting (or
>sending) the next.  The question is whether that's realistic at the next
>level up.  To be perfectly rigorous, you can't for example process the
>start of a SOAP envelope without seeing the end, because you don't even
>know whether it's well-formed until you see the end tag for
></soap:envelope>.  If that doesn't show up in the right place, you've got
>no Infoset, and no Envelope, therefore "no SOAP" (pun intended.)  XML
>doesn't stream, in this sense, and SOAP uses XML (modulo the permission 
to
>use optimistic concurrency and roll back all side effects once you
>discover that the envelope is poorly formed.)  Of course, many
>implementations will start work early, and will indeed roll back when the
>message proves to be not well formed.  Still, I think you'd be making a
>mistake to do a database commit based on a SOAP message until you'd seen
>the end tags.
>
>Similarly, if I want SOAP to be robust enough to make progress on 2 or 3
>large streaming attachments to the same message in parallel, then I can't
>just argue at the IP level.  I've got to look to Multipart MIME, DIME, or
>some level that will allow me to express the interleaving of those
>streams.  I think that's a very important use case for someday, but I'm
>proposing we not "go there" for now.
>
> >
> >
> > ______________________________________________________
> > John J. Barton          email:  John_Barton@hpl.hp.com
> > http://www.hpl.hp.com/personal/John_Barton/index.htm
> > MS 1U-17  Hewlett-Packard Labs
> > 1501 Page Mill Road              phone: (650)-236-2888
> > Palo Alto CA  94304-1126         FAX:   (650)-857-5100
>
>Thanks for your patience.  Hope this is helpful.
>
>Noah
>
>------------------------------------------------------------------
>Noah Mendelsohn                              Voice: 1-617-693-4036
>IBM Corporation                                Fax: 1-617-693-8676
>One Rogers Street
>Cambridge, MA 02142
>------------------------------------------------------------------

______________________________________________________
John J. Barton          email:  John_Barton@hpl.hp.com
http://www.hpl.hp.com/personal/John_Barton/index.htm
MS 1U-17  Hewlett-Packard Labs
1501 Page Mill Road              phone: (650)-236-2888
Palo Alto CA  94304-1126         FAX:   (650)-857-5100
Received on Sunday, 28 September 2003 13:03:52 UTC