- From: <noah_mendelsohn@us.ibm.com>
- Date: Tue, 23 Sep 2003 17:30:46 -0400
- To: "John J. Barton" <John_Barton@hpl.hp.com>
- Cc: Jacek Kopecky <jacek.kopecky@systinet.com>, Mark Nottingham <mark.nottingham@bea.com>, XMLP Dist App <xml-dist-app@w3.org>
John, let me try and respond to the various sections of your note: John Barton writes: > Noah, > > Unfortunately I am once again confused by the use of > the word "streaming". Maybe I missed a clarification > sometime back? Mark's formulation might be incomplete > but at least I understand its terms ;-). I use "streaming" to refer to the broad range of scenarios in which a sender and/or receiver needs to prepare or process the message incrementally. In other words, any alternative to the situation in which the entire message can be buffered both before sending and prior to start of processing following receipt. In the general case for large messages, such streaming allows for overlap of sender and receiver processing of the same message, though such overlap is not required and may only be achived in some cases. While there are probably more formal definitions out there, I think this is consistent with general usage in the industry. So, my use of the word involves a potentially broad range of use cases including but not limited to situations in which the XML SOAP envelope itself is very large, some sort of attachment is very large, where there is more than one large attachment (e.g. a video and an audio stream to be sent in parallel as generated, though I am not pushing hard on issues of isochrony here), situations such as satellite transmission in which there is value in overlapping processing at sender and receiver, etc. I believe that analogs of each of these scenarios have proven crucial at one time or another with earlier messaging systems. > If I look around eg W3C almost all the uses of the term > "streaming" are for audio and video. I did see this > however: > > > SteveS: not having to download the whole package > before unpacking part of it--streaming. Is that the > meaning of "streaming" in this context? If so, then it > is exactly what we need to make some of the use cases > feasible. I was not referring to any particular W3C characterization of streaming, but to the broad range of behaviors that people may at least think they want to see for SOAP in one context or another. We don't have to support them all, but I think we have to consider many and choose carefully. > I also found your second paragraph confusing. Let me > try to pick this apart: > > >* The HTTP binding provided with MTOM either > > (a) need not be optimized for > >streaming > > This reads like a non-requirement to me: why list the > things the binding is not optimized for? Well maybe the > OR case is the one I want... Mea culpa, it is of course a non-requirement. What I meant was: let me offer two alternative formulations for consideration by the workgroup. (a) While we may agree on the desirability of havning an abstract model that facilitates streaming when the binding wishes to do so, let's keep our initial MTOM HTTP binding simple. It's not clear to me that we understand the requirements well enough for streaming to choose well, so let's keep it simple, as was done for SOAP 1.2. In other words, let's not require ourselves to produce a streaming binding in association with this version of MTOM. Of course, MTOM like SOAP allows you to create your own bindings, and those might indeed facilitate streaming. That's option (a) for consideration. The alternative I proposed was (b): > > or ( b) SHOULD provide for accessibity to > > non-optimzed envelope information ahead > > of the serializations of large binary objects > > Well I think I understand this one: you are going to > tell me the size of stuff before you send it: I like > it. No, that's not what it said, though that is indeed an interesting design point for yet another set of use cases. What this one said is: make sure that the non-optimized >envelope< comes first. I.e. MTOM allows you to optimize parts of the envelope by taking them out of line and replacing them with xbinc:include. I was informally referring to the result of that as the "unoptimized" (part of) the envelope. In other words, you get the complete <soap:envelope> and all its children before any of the binary parts. That represents a form of streaming, insofar as it allows both sender and receiver to deal with the envelope before sending/receiving the so-called attachments. FWIW: requiring a length at the head of messsage segment tends to move streaming headaches from the receiver to the sender, at least in the case where the sender itself does not know the length of the data in advance. I think there are 2 or 3 use cases hidden in this area: you want to make life easy for the receiver, and the sender happens to know the lenght; you want to make life easy for the receiver even if the sender has to buffer a gigabyte to determine the length; you want to make life easy for the sender, so you make no requirement to send a length ahead of the data. Again, I think that all of these are legitimate design points for one use case or another. Indeed, it's the range of such requirements that suggests to me that we should go slow on adding streaming features. > >and SHOULD > >further provide for streaming in the case that only one large object has > >been optimized > > Huh? Why one? and anyway what is streaming? Well, this was an attempt to find an 80/20 point for those who have, say, a large XRay file as a GIF or JPEG, and want to stream that as well as the envelope. By stream I mean, be able to send out some of the bytes of the XRay before all of them are available at the sender and/or to be able to begin processing of the first few raster lines at the receiver before the whole thing is received (and perhaps before the sender has even sent the tail.) Considre, for example, the case where some scanning sensor is sending out the raster lines for the XRay as they become available, and we are sending them out in a SOAP message in parallel with the scanning of additional lines. Why one object only? Because I can see straightforward implementations of that. If there are two xrays streaming in parallel off two scanners (stereo image?), and you don't want to wait for all of the first one before you can make progress on the second, then you are in the business of interleaving them. That's going to be important for some use cases someday, but I was making the suggestion that interleaving might not make an 80/20 cut for a SOAP binding in the next few months. >If you > tell me enough information ahead of the bits, then > either I can accept your TCP/IP packets or refuse them. > Given that we are in HTTP these are the only two things > I can do right? I'd rather read something like: I think it depends on the level you're thinking about. At some level, all of TCP/IP streams (in the sense I mean) because it comes in one packet at a time, and you can always try to finish with one before accepting (or sending) the next. The question is whether that's realistic at the next level up. To be perfectly rigorous, you can't for example process the start of a SOAP envelope without seeing the end, because you don't even know whether it's well-formed until you see the end tag for </soap:envelope>. If that doesn't show up in the right place, you've got no Infoset, and no Envelope, therefore "no SOAP" (pun intended.) XML doesn't stream, in this sense, and SOAP uses XML (modulo the permission to use optimistic concurrency and roll back all side effects once you discover that the envelope is poorly formed.) Of course, many implementations will start work early, and will indeed roll back when the message proves to be not well formed. Still, I think you'd be making a mistake to do a database commit based on a SOAP message until you'd seen the end tags. Similarly, if I want SOAP to be robust enough to make progress on 2 or 3 large streaming attachments to the same message in parallel, then I can't just argue at the IP level. I've got to look to Multipart MIME, DIME, or some level that will allow me to express the interleaving of those streams. I think that's a very important use case for someday, but I'm proposing we not "go there" for now. > > > ______________________________________________________ > John J. Barton email: John_Barton@hpl.hp.com > http://www.hpl.hp.com/personal/John_Barton/index.htm > MS 1U-17 Hewlett-Packard Labs > 1501 Page Mill Road phone: (650)-236-2888 > Palo Alto CA 94304-1126 FAX: (650)-857-5100 Thanks for your patience. Hope this is helpful. Noah ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------
Received on Tuesday, 23 September 2003 17:38:04 UTC