Below are my comments on the 0.61 draft of the Infoset Addendum
by Adam Bosworth et al. These comments are written in an abrupt
style suitable for direct communications with the authors rather
than this public forum. Please do not take this style to have
any
significance regarding my position (or HP's) on this work. Take
it
the way it is: I didn't feel like going through it again to make it
sound nicer. (I also apologize for not making most of these
comments
last week). John.
====================================================================
Page 3 1. Introduction. First para.
--- as written:
Users often want to leverage the structured, extensible markup
conventions of XML without abandoning existing data formats that do
not readily adhere to XML 1.0 syntax. Often, users want to leave
their
existing non-XML formats as is, to be treated as opaque sequences
of
octets by XML tools and infrastructure.
---
These two sentences attempt to characterize the motivations for
combining binary with XML. In my opinion, the characterization
fails.
Rather, it is widely held that XML is total and completely
unsuitable
for the representation of large, regular data sets. Developers
of
systems that use large, regular data sets are not considering and
they
will not be considering "abandoning existing data
formats". This is
not a case of "often, users want to leave their existing
non-XML
formats as is". Rather it is the case that developers with these
kinds
of data will simply not use SOAP, period. I want to stress and
state
this clearly because the current document deprecates the use of
binary
data in its text while it says it will be facilitating it.
Here is an alternative that I believe more accurately reflects the
motivations of this community:
---
Users often want to leverage the structured, extensible markup
conventions of XML in the processing of data, like images, audio,
video, whose regularity and limited precision admits a compact
representation in binary and whose aggregate size would be
dramatically enlarged if encoded in XML. Furthermore some users
may
wish to activate extensive subsystems designed for existing non-XML
formats with XML-based process-control languages.
---
======================================================================
Page 3 1. Introduction para 2.
The last sentence starts "The former has gained some...".
Each time I
read this I stop at the end of the paragraph and look about for
"The
latter has...". Please make some parallel remark about
WS-Attachments
so your reader need not ponder why it is missing.
======================================================================
Page 4 3.1 and 3.2
---- As written:
3.1 xmime:MediaType attribute
The MediaType attribute specifies the media type [RFC 2045] of the
base64-encoded content of its [owner] element. Its normalized value
is
a media type as defined by Section 5.1 of RFC 2045 and RFC 2046
[RFC
2046]. When the MediaType attribute is not present the media type
"application/octet-stream" is assumed.
3.2 xmime:Binary type
The Binary type is an XML Schema complexType whose base is
xs:base64Binary. The type carries optional xmime:MediaType
attribute. This type can be used by elements that need to carry
base64-encoded data along with optional media type information.
----
These two subsections bind xmime:MediaType to base64 without
reason.
Since I believe that ultimately we will have a solution without
base64
I want to confine its influence to aspects where it is essential.
(As I said last time xmime:Binary is a poor choice of name since it
is
defined to be base64 not what the vast majority of Internet system
developers consider to be "binary".)
Here I have moved the last sentence of 3.1 to 3.2 and omitted
base64 in 3.1:
---- alternative proposal:
3.1 xmime:MediaType attribute
The MediaType attribute specifies the media type [RFC 2045] of its
[owner] element. Its normalized value is a media type as defined by
Section 5.1 of RFC 2045 and RFC 2046 [RFC 2046].
3.2 xmime:Binary type
The Binary type is an XML Schema complexType whose base is
xs:base64Binary. The type carries optional xmime:MediaType
attribute. When the MediaType attribute is not present the media
type
"application/octet-stream" is assumed. This type can be used
by
elements that need to carry base64-encoded data along with optional
media type information.
----
======================================================================
Page 5, Section 4, para 1.
--- As written:
For many applications, the use of base64 [base64] encoding for
opaque
data does not present a significant performance overhead,
especially
when weighed against the costs of a conformant XML 1.0 [XML]
parser. However, for applications that wish to avoid the overhead
of
base64 encoding, this specification defines an XML element
(xbinc:Include) that can reference opaque data for inclusion as
children of the referencing element.
---
Here again the motivation for using binary is treated as aberrant
behavior. In addition to putting off the very folks with the most
interest in this document, the engineering judgement expressed here
is
unsound. We will be processing gigabytes of binary data with
special
processing code and machinery; this processing will be controlled by
a
few tens of kilobytes of XML using standardized libraries. For us
the
cost of the XML and its parsing is entirely insignificant. If we
have
to base64 encode all the binary, alternatives to XML or out-of-band
messaging will be used.
--- Proposed alternative:
For some applications, the use of base64 [base64] encoding for
opaque
data presents a significant performance overhead. For example,
developers with large, regular data sets much larger in size than
the
meta-data and control information suffer an immediate 33% overhead
in using base64. For applications that wish to avoid the overhead
of
base64 encoding, this specification defines an XML element
(xbinc:Include) that can reference opaque data for inclusion as
children of the referencing element.
---
======================================================================
Page 5 section 4.1 para 1
--- As written:
The Include element carries a single attribute.
---
Specific is clearer:
-- As proposed:
The Include element carries a single "href" attribute (see
below 4.1.1).
---
======================================================================
Page 8. Section 5.
This section describes an interesting concept so I know you don't
want
to hear this, but this entire section should be removed and placed in
a
separate document. It does not describe an "Infoset
Addendum" and it
raises more questions than it answers.
The far and away most important contribution from this document is
the
first one cited in the introduction: alignment between the XML
Infoset-based data model and SOAP processing of attachments.
However
section 5 describes processing that is out of the Infoset-based
model.
To complete processing of a document containing URLs pointing
outside
of the "infoset" one has to have a model for URL resolution,
exactly
what I understand the Infoset-based model was trying to avoid.
Of course representations for some of those URLs may be cached in
the
message, but this possibility does not absolve us from having to
specify what happens when they are not. Moreover the cache nature
of
these representations only adds to the specification problem: under
what circumstances will SOAP processors accept the cache vs those
circumstances where they will return to the resource for a fresh
copy?
I will tell you that the answer I hear from our developers is the
simple one: we will never read the cache since we don't know what
it
means.
This entire area is a fascinating one that needs future work and
could
be fruitful to explore. It just isn't one that should be covered
in
this document at this time.
To put this to you in a different way: we can have a processing
model
for embedded links in XML that allows those links to be satisfied
in
local messages. That would be a processing model for SwA 1.0
without
the use of the Include mechanism. You introduced the Include
mechanism to avoid defining that processing model. Section 5
provides
a different and inferior (underspecified with respect to time)
solution to this problem without solving the processing model
problem.
======================================================================
Page 12 Section 6
---As written:
The SOAP [SOAP11, SOAP12] processing model is defined in terms of
an
Infoset [Infoset]. As defined in Section 4.2, processing MUST
behave
as if the swa:DoInclude header is processed first.
---
Here we get to the heart of the matter. I think it important
to
clarify the role of encoding at the interface between binary and
XML
---Proposed (insertion)
The SOAP [SOAP11, SOAP12] processing model is defined in terms of
an
Infoset [Infoset]. The Infoset itself does not require any
encoding
like base64 for non-XML data. Applications designed for non-XML
data
likewise do not require any base64 encoding. However at least
one
important example, XML-based digital signature algorithms, need to
cross over between XML content and binary content. More
algorithms
may appear as XML processing become commonplace. For these
algorithms
we need a consistent way to handle the binary content. As defined
in
Section 4.2, processing MUST behave as if the swa:DoInclude header
is
processed first.
======================================================================
Page 15 Section 7 para 1
---As written:
To satisfy this need, this specification defines the xmime:Accept
which can be used to annotate schema declarations of elements of
type
xmime:Binary .
---
Since xmime:Binary forces base64, only one possible way to send
binary, allow Accept to modify other types:
---Proposed:
To satisfy this need, this specification defines the xmime:Accept
which can be used to annotate schema declarations of elements
that may contain MIME-typed data.
----
======================================================================
Page 16 Section 7.1 para 1
---As written:
The Accept attribute may be used on element declarations in schema
to
specify a list of accepted media types of the base64-encoded
content
of instances of the element.
---
as above. Note that the example that follows this first paragraph
is
fine since the Accept attribute may indeed be used with your
xmime:Binary element.
---Proposed:
The Accept attribute may be used on element declarations in schema
to
specify a list of accepted media types of the content
of instances of the element.
---
======================================================================
Page 18 Section 8 para 1
---As written:
Current XML signature algorithms require signing the included data
as
base64-encoded characters;
---
As I read the document, XML Signature simply provides a solution
for
base64 encoded data and says nothing about other encodings. But
the
important edit here is to add a reference to this part of the W3C
documentation.
---Proposed:
Current XML signature algorithms allow signing of base64-encoded
data
[XMLDSIG].
[XMLDSIG]
"XML-Signature Syntax and Processing"
Ed. Donald Eastlake, Joseph Reagle,David Solo;
Authors Mark Bartel, John Boyer, Barb Fox, Brian LaMacchia, Ed
Simon,
[http://www.w3.org/2000/09/xmldsig#base64]
---
At 08:25 AM 4/2/2003 -0800, Jeffrey Schlimmer wrote:
The correct link for the Word
version is
http://www.gotdotnet.com/team/jeffsch/paswa/paswa61.doc
(Getting ahead of myself with the versioning :-)
> -----Original Message-----
> From: Jeffrey Schlimmer
> Sent: Wednesday, April 02, 2003 8:24 AM
> To: xml-dist-app@w3.org
> Cc: Martin Gudgin
>
> The document illustrating an Infoset approach to the
attachment
feature
> has been revised with some clarifications and additional
examples.
>
>
http://www.gotdotnet.com/team/jeffsch/paswa/paswa61.html
>
http://www.gotdotnet.com/team/jeffsch/paswa/paswa61.pdf
>
http://www.gotdotnet.com/team/jeffsch/paswa/paswa62.doc
>
> > -----Original Message-----
> > From: xml-dist-app-request@w3.org
[mailto:xml-dist-app-request@w3.org]
> On
> > Behalf Of Martin Gudgin
> > Sent: Tuesday, March 25, 2003 9:42 AM
> > To: xml-dist-app@w3.org
> >
> >
> > We have now posted the document illustrating an infoset
approach to
the
> > attachment feature. You can find html[1], pdf[2] and word
docs[3].
This
> > document is intended to be a concrete realisation of the ideas
laid
out
> > in the white paper at[4].
> >
> > Apologies for the delay.
> >
> > Gudge
> >
> > [1]
http://www.gotdotnet.com/team/mgudgin/paswa/paswa.html
> > [2]
http://www.gotdotnet.com/team/mgudgin/paswa/paswa.pdf
> > [3]
http://www.gotdotnet.com/team/mgudgin/paswa/paswa.doc
> > [4]
http://www.xml.com/pub/a/2003/02/26/binaryxml.html
______________________________________________________
John J. Barton email: John_Barton@hpl.hp.com
http://www.hpl.hp.com/personal/John_Barton/index.htm
MS 1U-17 Hewlett-Packard Labs
1501 Page Mill Road phone: (650)-236-2888
Palo Alto CA 94304-1126 FAX: (650)-857-5100