- From: Mark Jones <jones@research.att.com>
- Date: Fri, 31 Jan 2003 12:33:57 -0500 (EST)
- To: xml-dist-app@w3.org
AFTFers, This version of the requirements folds in Marc's compression requirement, and inlines Jeff's proposed requirements (DR18, DR19, and DR20) and BEA comments/requirements from David Orchard in preparation for the Friday, 2003/01/31 AFTF meeting. --mark Mark A. Jones AT&T Labs -- Strategic Standards Division Shannon Laboratory Room 2A02 180 Park Ave. Florham Park, NJ 07932-0971 email: jones@research.att.com phone: (973) 360-8326 fax: (973) 236-6453 ________________________________________________________________ Concrete Attachment Feature Requirements ---------------------------------------- <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> * in the intro, define 'attachments' as 'a technology that allows for the encapsulation of and reference to arbitrary data, including that which is not legally serialized into XML 1.0 (e.g., binary)' * define 'parts' as 'units of arbitrary data' </davidO> Considerations -------------- * If existing packaging schemes (e.g., Multipart-MIME, DIME, ZIP, tar, jar, etc.) meet the requirements, or represent sensible tradeoffs, then the specification SHOULD use such existing schemes. * The specification should, where reasonably practical, be designed to facilitate debugging, tracing, and other diagnostic activities. <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> * The specification should aid message construction and parsing with simple tools. </davidO> General Requirements -------------------- R8. The specification must describe its relationship to the properties defined in Table 1 (att:SOAPMessage and att:SecondaryPartBag) in the SOAP 1.2 Attachment Feature specification. R9. The specification must describe its points of extensibility. R15. The specification should not unnecessarily preclude convenient description by languages such as WSDL. [WSDL should have enough extensibility to handle reasonable new attachment specifications include ours. Our spec should be reasonably describable by languages such as WSDL.] <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> R15. The specification should be conveniently describable by languages such as WSDL. </davidO> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> DR24. The specification should include sample changes to WSDL 1.2 and/or extensions to WSDL. </davidO> R17. The specification must work with the SOAP 1.2 HTTP binding and shouldn't unnecessarily preclude working with other bindings. Representation -------------- R1. The specification must define a means to carry multiple data parts. R2. The specification must define a means for parts to carry arbitrary data, including non-XML data (e.g., binary data and XML fragments). R3: The specification should support efficient implementation of: a) parsing the physical representation to separate and identify its constituent parts. b) programming systems which efficiently resolve a URI to retrieve the data (and metadata) comprising the corresponding part. R4. The specification should use a reasonably space-efficient representation. DR5. The representation must efficiently support the addition and deletion of parts. <chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html"> Hmmm... While it is clear that an implementation of the specification would likely carry this requirement, it is less than clear that the requirement is applicable to the specification itself. Further, one would imagine that by this statement, it would be the intended to cover the insertion or in-line deletion of parts, or had you only appending and truncation in mind? Again, it isn't clear that this requirement, as written is either testable of a specification or relevant for a specification that is not intended to be implementation-specific. </chris> <markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html"> The point here was to make the spec relatively friendly to intermediaries that might need to modify the attachment bundle in straightforward ways. (roughly resonant with the fact that insertions and deletions of headers in a SOAP envelope are pretty straightforward syntactically, for example). </markJ> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> If that's the goal, then I think we need to specifically say: (alternate) DR5. The representation SHOULD efficiently support the addition and deletion of parts by intermediaries. Otherwise, I agree completely with Chris' concern. Indeed, I am somewhat nervous that even at the intermediary the issues will be hard to pin down, and may relate to higher level constructs that we can't control. After all, if you write an application that has to inspect the whole message before deciding what to insert of delete, then you almost surely have to buffer the whole thing at the intermediary. Once you've done that, then Chris is right on even at the intermediary. How can you tell what is or isn't efficient for me at such a buffering intermediary? I've very probably stored the parts in ways you wouldn't easily guess (e.g. some relational DB fields.) </noah> DR13. The specification must provide support for large parts. <chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html"> And small ones as well one would imagine. How large? Arbitrarily large? Just "pretty big", really, really large" or "incomprehensibly large"? :) What about parts who's size is not known at the time that the serialization is begun? </chris> <markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html"> These points have been discussed briefly. This one needs more work. </markJ> <barton href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0030.html"> The reason for this kind of requirement is the dominant impact of I/O and memory allocation on performance. For small messages, all attachment scheme will be equal since CPUs are infinitely fast. "Large" of course changes over time as hardware resources improve. Design for messages between 1MB and 1GB. 5 years from now, when this standard is in use, allocators can bite off 1MB but 1GB will likely still call for disk. You can shift these numbers around, but they will factor into the design: might as well discuss them explicitly. In my opinion, parts whose size is not known should not be "attached" to SOAP messages. Rather one should use messages to set up an out of band stream mechanism. </barton> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> I think the question with small is, do you care about relative overhead? Is it OK to add 200 bytes of overhead to a 5 byte attachment. In some situations the answer is: yes, the whole message is still only a few hundred bytes and as John says, it's hard on modern processors to get in trouble processing a single small message. On the other hand, if you have thousands of parts per message, or thousands of messages per second, the overhead can indeed really add up. So, I don't think it's obviously a non-issue. </noah> DR21. The specification should provide convenient means for extending the metadata carried with a message. Such mechanisms should specifically allow for extensions to the set of metadata associated with individual parts. DR22. The specification should provide a means by which any or all parts MAY be labeled with associated MIME types. (I.e. applications sending a message are not obligated to label parts with MIME types, but the specification must provide for carrying the MIME type if provided.) <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> DR25. The specification must provide specification of media types for parts. </davidO> DR23. The specification must be sufficiently flexible/extensible to allow for and describe transformations (encoding/compression/encryption/...) of parts. <marcH> I was thinking along the lines of HTTP where you have a media type plus a transfer encoding. The same thing might be useful in the package: this part is text/plain but is compressed using ... or this part is text/plain but is encrypted using .., </marcH> <jeff href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0032.html"> DR18. The specification must define a means to format messages for down-level receivers that do not understand the specification. </jeff> <sanjiva href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0034.html"> How can any spec say something about those who don't understand the spec? I'm confused. </sanjiva> <barton href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0033.html"> Maybe you can clarify this one Jeff...the way I read it, it sounds impossible. </barton> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> I'm confused too. </noah> <jeff href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0032.html"> DR19. The specification must enable efficient allocation of buffers by receivers. </jeff> <sanjiva href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0034.html"> I'm again confused; while a statement like "this spec must be implementable as efficiently as possible" is reasonable (and motherhood-and-apple-pie IMO), speaking specifically about buffer allocation seems rather pointed. </sanjiva> <barton href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0033.html"> This one motivates some of the other requirements but it implies that the sender understand the receiver's memory allocation capabilities. On one extreme the requirement could amount to "give the content length of attachments up front", but at the other extreme it could require the interleaving of parts to achieve a serialization optimal for receiver processing. As an example of the latter, the UPNP Printing folks worried about how an extremely long XHTML doc with many inline images could be a printed with one page buffer. While that may seem like an example far from the one most SOAP folks consider, once you get to pipelined processing of composed SOAP services the differences begin to fade. These are cases you want to be able to handle and they are cases that non-XML systems deal with. Of course the serialization of XHTML is well-defined. Serialization for arbitrary receiver processing isn't. That makes this requirement difficult to spell out absent information on the receiver buffer capability. Consequently one might go for a requirement that asks the spec. to allow attachments to be placed in the stream physically near their first point of XML reference rather than getting into buffers. That would pick up the critical use case without getting mired in an open-ended problem. </barton> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> I think we can say: "Attention should be given to likely implementation optimizations. I agree with Sanjiva, going much beyond that is too specific.) </noah> <barton> Sanjiva, the key words here are "by receivers". The serialization mechanism can have serious impacts on resource constrained or heavily loaded receivers. Emitting a SOAP message in an HTTP-style MIME-like format without content-length headers leaves the receiver with no recourse but multiple buffering layers and repeated dynamic memory allocations as more content arrives. For resource constrained receivers, the result is late and annoying buffer overflow; for heavily loaded receivers, the result is poor performance. This is, unfortunately not apple-pie since typically a receiver-friendly protocol requires resources to be spent on the sender, eg to count the bytes as the package is assembled. The specification will shift real costs. Hope this helps clarify this issue. </barton> <jeff href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0052.html"> John, well put. I hope the AFTF agrees. --Jeff </jeff> <jeff href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0032.html"> DR20. The specification must allow messages to be secured using the mechanisms defined in WS-Security. </jeff> <sanjiva href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0034.html"> WS-Security only applies to SOAP envelopes. This requirement would hence have the effect of precluding MIME/DIME style packaging .. </sanjiva> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> +1 </noah> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> DR27. The specification should support securing of messages and message parts, such as use of encryption and signatures, in a simple manner. This is different than the proposed "support ws-security requirement", in that it covers application of encryption and signature without necessarily meaning use of ws-security. </davidO> <gudge href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0051.html"> DR29. A message with all its parts, however separated physically, must be representable as a single infoset and describable as a single XML element in an XML schema. </gudge> Reference to Parts ------------------ DR6. The specification must permit parts to be identified by URIs. <chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html"> Hmmm... I think that the specification should require that parts be identified by URI, but that they may be identified using other means as well. Of course, they could be identified by relative URI, not just absolute URI. </chris> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> +1 except for the references to relative URI. I think we want: The specification must provide that each part be identified by an (at least one) absolute URI. I think issues of relative should be above our level. If some system (e.g. SOAP itself) wants to provide base URI and resolve relatives to absolute, that's fine, but we don't worry about that I think. I would not want a part to be known at the deepest level as "../p". </noah> <markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html"> We can consider your wording instead. </markJ> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> (alternate) DR6. The specification must permit parts to be identified by URIs or URI References. This is similar to ChrisF's comment. </davidO> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0045.html"> I am a little surprised. I would have thought that what we want is: * The identity of each part is a URI (I.e. an absolute URI) * References to parts are in the form of URI references (which are resolved through the usual mechanisms to yield the absolute URI). David: are you really saying that you want to allow "../a" as the identity of a part? Thanks. </noah> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0047.html"> ../a has nothing to do with URI References vs URIs. ../a is allowed by URIs and by URI references. You might be thinking of absolute URIs however :-) URI References are URIs that may have fragments. Oh darn, we don't have a term for a URI that has an absolutized portion that may have fragments. </davidO> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0048.html"> I stand corrected. You're right of course. Still, I would think that we want to follow web architecture. As far as I know, that means that the resource which is a part should be identified by an absolute URI (not relative, NO fragment ID.) References to the part as a whole should allow relative and absolute forms. References within parts that have known media type should allow URI References, including fragment ID. Bottom line: a part is named by an absolute URI. References are in the form of URI references, but Fragid is a reference within the part. Specifically, two references that differ only in their fragid must resolve to the same part. Also: on the phone call I suggested a requirement that the attachment implementation be capable of carrying a media type for each part. David: does this sound right? </noah> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0049.html"> Web architecture doesn't stipulate absolute URIs. I would like to allow frag ids, specifically so that parts could actually be fragments within an xml document. One example would be a soap with attachments package that contains 2 xml documents, and the first refers to a part that is within the 2nd xml document. I expect that in most cases, people would use absolute URIs, but I can think of scenarios where they would want a fragment. Let's make this a bit more concrete. I want to chunk a large xml document. Say I decide to split this into 2 documents. I could use an xinclude in the first to refer to the 2nd, and I have an application that reads the first chunk, then afterwards resolves the xinclude. As XML requires a root note, the XInclude has to point to a fragment in the 2nd document, specifically all the children of the root node. Now if a new version of XML allowed xml to not have a root node, like external entities, this might be solved. :-) I absolutely agree with carrying the media type. Violently in fact. These documents, and parts, must be correctly self-describing. Now that's web architecture! </davidO> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0050.html"> >> I would like to allow frag ids, specifically >> so that parts could actually be fragments within an >> xml document. One example would be a soap with >> attachments package that contains 2 xml documents, >> and the first refers to a part that is within the >> 2nd xml document. Hmm. This is an interesting idea, and I can see the merits. On the other hand, don't we then lose the ability for the parts themselves to have a MIME type and for fragments to reference within the parts? I wonder whether that isn't the more important use case. I'm nervous about trying to allow both at the same time. Does the web even allow: xxxx#a#b to reference a piece of a part that is itself within an XML document? I think the design point for parts is only secondarily XML within XML, I think it's primarily non-XML data, and I think MIME types are the obvious web-compatible way to handle that. I think it's important that attachments are just web resource (or at least representations of web resources) that happen to travel with the messages. I'm not sure your proposal is compatible with that view. </noah> DR7. The URI identification scheme must be robust under the addition and deletion of parts -- i.e., it must not require that URIs to other parts be altered, it must be relatively easy to avoid URI conflicts, etc. DR11. (a) The specification should permit an initial human readable part. (b) The specification should not specify a particular ordering of parts. [still noodling on which version to prefer] <chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html"> Not sure I follow this... </chris> <markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html"> There was some sentiment for flexibility in part ordering -- for example, having a text part preceeding even the SOAP message. </markJ> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> Right. I also think the notion of "initial" is fuzzy. Is it within the first 100 bytes? Is it no binary data between the start of message and this initial part (so you can use text tools to get that far). Does it preclude interleaving? I think this is too specific and we should drop it. </noah> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> preferred wording is (b) </davidO> DR12. The SOAP message part should be readily locatable/identifiable. <chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html"> Should it not be the case that ALL parts be identified, identifiable? What would make the SOAP part unique in this regard? </chris> <markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html"> We wanted to make sure if there were multiple SOAP message parts that we could identify which one was the primary part and which were attachments. This may be an issue if order were arbitrary, for example. </markJ> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> +1 but suggests (alternate) DR12. The primary (SOAP) message part should be readily locatable/identifiable. I think this correctly layers the packaging abstraction (part) from its use by SOAP. </noah> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> (alternate) DR12. Any message parts should be readily locatable/indentifiable. </davidO> DR16. The part identifier scheme to be determined by sending application. <chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html"> "scheme" seems to imply "URI", but my guess is that it does not. Again, I would strongly recommend that parts be identified by URI (relative or absolute). </chris> <markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html"> URI is what I have in mind. </markJ> <noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html"> No. I think that URI schemes should be used according to their definition. This should not be a round-about way of enabling the caching scenario (if that's what's intended.) Cachcing can be enabled with a SOAP feature (mapping an HTTP: URI to a CID:, for example). The part in the message is unlikely to be correcly id'd directly with an HTTP URI (unless we're doing lazy pull through an http network.) </noah> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> DR26. The specification should support streaming of parts, ie chunked encoding. A sample scenario of this should also be provided. </davidO> <marcH href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0053.html"> Isn't chunking is a solution to streaming rather than a requirement ? </marcH> <davidO href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0044.html"> DR28. The specification may provide manifest functionality. </davidO>
Received on Friday, 31 January 2003 12:34:30 UTC