AFTF requirements list with comments, post-2003/01/28 telcon from Mark Jones on 2003-01-30 (xml-dist-app@w3.org from January 2003)

From: Mark Jones <jones@research.att.com>
Date: Thu, 30 Jan 2003 13:58:06 -0500 (EST)
To: xml-dist-app@w3.org
Message-Id: <200301301858.NAA16466@bual.research.att.com>

AFTFers,

This version of the requirements reflects discussion from the
2003-01-29 telcon of the AFTF. Comments already considered
have been removed and new comments have been added. We are
(linearly) down to DR5 in our current discussion. I've also
folded in DR21 and DR22 regarding metadata and MIME types that
Noah wrote up and have a placeholder for Marc's compression
requirement.

Our current scorecard is:
10 requirements/considerations agreed (including R8, R9, R15, R17, R1, R2, R3, R4)
3 new requirements already discussed and roughly agreed in principle (DR21, DR22, DR23)
7 original requirements not yet discussed (DR5, DR13, DR6, DR7, DR11, DR12, DR16)
3 proposed requirements not yet discussed (DR18, DR19, DR20)

--mark

Mark A. Jones
AT&T Labs -- Strategic Standards Division
Shannon Laboratory
Room 2A02
180 Park Ave.
Florham Park, NJ 07932-0971

email: jones@research.att.com
phone: (973) 360-8326
fax: (973) 236-6453

________________________________________________________________

Concrete Attachment Feature Requirements
----------------------------------------

Considerations
--------------

* If existing packaging schemes (e.g., Multipart-MIME, DIME, ZIP, tar,
jar, etc.) meet the requirements, or represent sensible tradeoffs,
then the specification SHOULD use such existing schemes.

* The specification should, where reasonably practical, be designed to
facilitate debugging, tracing, and other diagnostic activities.

General Requirements
--------------------

R8. The specification must describe its relationship to the
properties defined in Table 1 (att:SOAPMessage and
att:SecondaryPartBag) in the SOAP 1.2 Attachment Feature
specification.

R9. The specification must describe its points of extensibility.

R15. The specification should not unnecessarily preclude convenient
description by languages such as WSDL.
[WSDL should have enough extensibility to handle reasonable
new attachment specifications include ours. Our spec should
be reasonably describable by languages such as WSDL.]

R17. The specification must work with the SOAP 1.2 HTTP binding and
shouldn't unnecessarily preclude working with other bindings.

Representation
--------------

R1. The specification must define a means to carry multiple data
parts.

R2. The specification must define a means for parts to carry
arbitrary data, including non-XML data (e.g., binary data and XML
fragments).

R3: The specification should support efficient implementation of:
a) parsing the physical representation to separate and identify its
constituent parts.
b) programming systems which efficiently resolve a URI to retrieve the
data (and metadata) comprising the corresponding part.

R4. The specification should use a reasonably space-efficient
representation.

DR5. The representation must efficiently support the addition and
deletion of parts.

<chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html">
Hmmm... While it is clear that an implementation of the specification
would likely carry this requirement, it is less than clear that the
requirement is applicable to the specification itself. Further, one
would imagine that by this statement, it would be the intended to cover the
insertion or in-line deletion of parts, or had you only appending and
truncation in mind?

Again, it isn't clear that this requirement, as written is either
testable of a specification or relevant for a specification that is not
intended to be implementation-specific.
</chris>

<markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html">
The point here was to make the spec relatively friendly to
intermediaries that might need to modify the attachment bundle in
straightforward ways. (roughly resonant with the fact that insertions
and deletions of headers in a SOAP envelope are pretty straightforward
syntactically, for example).
</markJ>

<noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html">
If that's the goal, then I think we need to specifically say:

DR5. The representation SHOULD efficiently support the addition and
deletion of parts by intermediaries.

Otherwise, I agree completely with Chris' concern. Indeed, I am somewhat
nervous that even at the intermediary the issues will be hard to pin down,
and may relate to higher level constructs that we can't control. After
all, if you write an application that has to inspect the whole message
before deciding what to insert of delete, then you almost surely have to
buffer the whole thing at the intermediary. Once you've done that, then
Chris is right on even at the intermediary. How can you tell what is or
isn't efficient for me at such a buffering intermediary? I've very
probably stored the parts in ways you wouldn't easily guess (e.g. some
relational DB fields.)
</noah>

DR13. The specification must provide support for large parts.

<chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html">
And small ones as well one would imagine. How large? Arbitrarily
large? Just "pretty big", really, really large" or "incomprehensibly
large"? :)

What about parts who's size is not known at the time that
the serialization is begun?
</chris>

<markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html">
These points have been discussed briefly. This one needs more work.
</markJ>

<barton href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0030.html">
The reason for this kind of requirement is the dominant impact of I/O
and memory allocation on performance. For small messages, all
attachment scheme will be equal since CPUs are infinitely fast.
"Large" of course changes over time as hardware resources improve.
Design for messages between 1MB and 1GB. 5 years from now, when
this standard is in use, allocators can bite off 1MB but 1GB will likely
still call for disk. You can shift these numbers around, but they will
factor into the design: might as well discuss them explicitly.

In my opinion, parts whose size is not known should not be "attached"
to SOAP messages. Rather one should use messages to set up an
out of band stream mechanism.
</barton>

<noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html">
I think the question with small is, do you care about relative overhead?
Is it OK to add 200 bytes of overhead to a 5 byte attachment. In some
situations the answer is: yes, the whole message is still only a few
hundred bytes and as John says, it's hard on modern processors to get in
trouble processing a single small message. On the other hand, if you have
thousands of parts per message, or thousands of messages per second, the
overhead can indeed really add up. So, I don't think it's obviously a
non-issue.
</noah>

DR21. The specification should provide convenient means for extending the
metadata carried with a message. Such mechanisms should specifically
allow for extensions to the set of metadata associated with individual
parts.

DR22. The specification should provide a means by which any or all parts
MAY be labeled with associated MIME types. (I.e. applications sending a
message are not obligated to label parts with MIME types, but the
specification must provide for carrying the MIME type if provided.)

DR23. <placeholder for compression requirement -- Marc H.>

Reference to Parts
------------------

DR6. The specification must permit parts to be identified by URIs.

<chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html">
Hmmm... I think that the specification should require that parts be
identified by URI, but that they may be identified using other means
as well. Of course, they could be identified by relative URI, not just
absolute URI.
</chris>

<noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html">
+1 except for the references to relative URI. I think we want: The
specification must provide that each part be identified by an (at least
one) absolute URI.

I think issues of relative should be above our level. If some system
(e.g. SOAP itself) wants to provide base URI and resolve relatives to
absolute, that's fine, but we don't worry about that I think. I would not
want a part to be known at the deepest level as "../p".
</noah>

<markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html">
We can consider your wording instead.
</markJ>

DR7. The URI identification scheme must be robust under the addition
and deletion of parts -- i.e., it must not require that URIs to
other parts be altered, it must be relatively easy to avoid URI
conflicts, etc.

DR11. (a) The specification should permit an initial human readable
part.
(b) The specification should not specify a particular ordering
of parts.
[still noodling on which version to prefer]

<chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html">
Not sure I follow this...
</chris>

<markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html">
There was some sentiment for flexibility in part ordering -- for
example, having a text part preceeding even the SOAP message.
</markJ>

<noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html">
Right. I also think the notion of "initial" is fuzzy. Is it within the
first 100 bytes? Is it no binary data between the start of message and
this initial part (so you can use text tools to get that far). Does it
preclude interleaving? I think this is too specific and we should drop
it.
</noah>

DR12. The SOAP message part should be readily locatable/identifiable.

<chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html">
Should it not be the case that ALL parts be identified, identifiable?
What would make the SOAP part unique in this regard?
</chris>

<markJ href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0028.html">
We wanted to make sure if there were multiple SOAP message
parts that we could identify which one was the primary part and which
were attachments. This may be an issue if order were arbitrary, for
example.
</markJ>

<noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html">
+1 but reword as"

DR12. The primary (SOAP) message part should be readily
locatable/identifiable.

I think this correctly layers the packaging abstraction (part) from its
use by SOAP.
</noah>

DR16. The part identifier scheme to be determined by sending
application.

<chris href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0025.html">
"scheme" seems to imply "URI", but my guess is that it does not.
Again, I would strongly recommend that parts be identified by URI
(relative or absolute).
</chris>

<noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html">
No. I think that URI schemes should be used according to their
definition. This should not be a round-about way of enabling the caching
scenario (if that's what's intended.) Cachcing can be enabled with a SOAP
feature (mapping an HTTP: URI to a CID:, for example). The part in the
message is unlikely to be correcly id'd directly with an HTTP URI (unless
we're doing lazy pull through an http network.)
</noah>

________________________________________________________________

New proposed requirements:
--------------------------

DR18. The specification must define a means to format messages for
down-level receivers that do not understand the specification.

<sanjiva href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0034.html">
How can any spec say something about those who don't understand the
spec? I'm confused.
</sanjiva>

<barton href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0033.html">
Maybe you can clarify this one Jeff...the way I read it, it sounds
impossible.
</barton>

<noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html">
I'm confused too.
</noah>

DR19. The specification must enable efficient allocation of buffers by
receivers.

<sanjiva href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0034.html">
I'm again confused; while a statement like "this spec must be
implementable as efficiently as possible" is reasonable (and
motherhood-and-apple-pie IMO), speaking specifically about
buffer allocation seems rather pointed.
</sanjiva>

<barton href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0033.html">
This one motivates some of the other requirements but it implies that
the sender understand the receiver's memory allocation capabilities.
On one extreme the requirement could amount to "give the content
length of attachments up front", but at the other extreme it
could require the interleaving of parts to achieve a serialization
optimal for receiver processing.

As an example of the latter, the UPNP Printing folks worried about how
an extremely long XHTML doc with many inline images could be a printed
with one page buffer. While that may seem like an example far from
the one most SOAP folks consider, once you get to pipelined processing
of composed

SOAP services the differences begin to fade. These are cases you want
to be able to handle and they are cases that non-XML systems deal
with.

Of course the serialization of XHTML is well-defined. Serialization
for arbitrary receiver processing isn't. That makes this requirement
difficult to spell out absent information on the receiver buffer
capability. Consequently one might go for a requirement that asks the
spec. to allow attachments to be placed in the stream physically near
their first point of XML reference rather than getting into buffers.
That would pick up the critical use case without getting mired in an
open-ended problem.
</barton>

<noah href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0037.html">
I think we can say: "Attention should be given to likely implementation
optimizations. I agree with Sanjiva, going much beyond that is too
specific.)
</noah>

<barton>
Sanjiva, the key words here are "by receivers". The serialization
mechanism can have serious impacts on resource constrained or
heavily loaded receivers. Emitting a SOAP message in an
HTTP-style MIME-like format without content-length headers leaves
the receiver with no recourse but multiple buffering layers and repeated
dynamic memory allocations as more content arrives. For resource
constrained receivers, the result is late and annoying buffer overflow;
for heavily loaded receivers, the result is poor performance.

This is, unfortunately not apple-pie since typically a receiver-friendly
protocol requires resources to be spent on the sender, eg to count
the bytes as the package is assembled. The specification will
shift real costs.

Hope this helps clarify this issue.
</barton>

DR20. The specification must allow messages to be secured using the
mechanisms defined in WS-Security.

<sanjiva href="http://lists.w3.org/Archives/Public/xml-dist-app/2003Jan/0034.html">
WS-Security only applies to SOAP envelopes. This requirement would
hence have the effect of precluding MIME/DIME style packaging ..
</sanjiva>

Received on Thursday, 30 January 2003 13:58:38 UTC