- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 26 Oct 2007 18:28:40 -0400
- To: "David Orchard" <dorchard@bea.com>
- Cc: "W3C-TAG Group WG" <www-tag@w3.org>, shh@us.ibm.com, haggar@us.ibm.com, klawrenc@us.ibm.com
Dave:
Speaking as a TAG member:
Thank you for doing this. I think there's a lot of good stuff here. My
overall impression, FWIW, is that this would have more impact if we went
into just a bit less detail, and focussed a slightly shorter note on some
key points. Don't take this is a significant criticism. I think the core
of this is very good, just that a bit of tightening would make it more
effective. If that means a few details fall out of it, I'm not sure
that's all bad.
Speaking as an IBMer:
> - IBM's withdrawal [4] from the Working Group shows their detailed and
specific disagreement with the benefits of EXI and remains unaddressed.
Thank you for referencing this note; we're very glad that our concerns are
viewed as significant for this discussion. I do note that, like several
other references from your note, the link is to member-confidential email.
I doubt that anyone in IBM would object to having a copy posted in a
public archive, and if you'd like I could in principle check with my IBM
colleagues. I say in principle because our note also refers to some
correspondence among a smaller group of individuals in the W3C. It was
originally written as input to the chairs and W3C staff, and I think we
would need the permission of this individuals as well. Shall we try and
ask all these folks whether they're OK with at least IBM's note being
posted in a public place? If not, I think it will be frustrating for your
readers to find the link and not be able to follow it.
> Just recently a working group member said "The same goes for
> IBM, if they really thought the result tells something
> important, they would have spent some more time establishing a
> case out the result against EXI. "[5]
Again, this is a member only link. I think you've taken a small enough
bit that it's sort of out of context. The quote appears to imply that
surely IBM did not take its concerns about EXI seriously, as we would
otherwise have "spent more time establishing the case...against EXI". In
fact, our work on XML performance was done over several years, and just
the comparison work we did on EXI involved several weeks of work in
particular. Those comparisons were presented in a quite detailed
presentation to the EXI workgroup, and I happened to be there as an
observer that day. I think it's only fair if you are going to have a
quote like this that you ensure that the entire email at [5] is publicly
accessible so that people can draw their own conclusions about it.
I think it's fair to say that IBM believes that EXI offers interesting
compression on XML, and some speed gains in many use cases; we also think
that the speed gains over well optimized text implementations are not
nearly as great as might be inferred from the measurements presented by
the EXI group. The issues include many that you've put into your note,
Dave (choice and weighting of test cases, use of Java, etc.)
More to the point, I think you're trying to make the case that the EXI
workgroup didn't take our concerns seriously (I offer no opinion on that
-- while our concerns have not been satisfied, I wouldn't want to accuse
anyone in the EXI group of not taking them seriously). In fact, the quote
more directly seems to imply that we in IBM did not take our own concerns
seriously, and that I must object to (though the author is surely
entitled to express that opinion).
Maybe it would be better to just point out that we were among those who
raised concerns, possibly linking our note if we get it made public, and
to indicate that we in IBM do not believe the performance analysis in the
EXI drafts addresses our concerns. If you want to additionally say that
the authors of [5] think we did not take our own concerns seriously enough
to give them force, of course you may do so, but I think we'd want a
chance to breifly and politely rebut that.
Sorry to make a fuss about this, but there's already a lot of heat and
confusion around this whole issue, and I'm afraid that the current
fragmented quotes from member-only emails will make it worse and not
better.
Noah
[4] http://lists.w3.org/Archives/Member/member-exi-wg/2007Mar/0014.html
[5] http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/0010.html
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
"David Orchard" <dorchard@bea.com>
Sent by: www-tag-request@w3.org
10/25/2007 01:12 AM
To: "W3C-TAG Group WG" <www-tag@w3.org>
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: Review of EXI
Draft TAG response to EXI. I believe there could be significant negative
response so I encourage the TAG members to suggest modifications as they
see fit.
The TAG does not consider it's concerns raised in May 2005[1] to be
addressed. We are unable to ascertain whether our concerns have been
addressed or not from the measurements document[2] produced by the working
group. It appears that the Working Group believes that non-Working Group
members must prove the WG wrong in it's decisions on EXI technologies,
rather than convincing the world that there is indeed a conclusive case.
The measurements document is full of 127 pages of data and analysis but we
did not see any material in a summary or abstract that addressed our
concerns. In the almost 3 months since publication of the document on
July 25th 2007, the EXI working group has had 28 messages on their mailing
list, none of which are comments on the measurements document. During the
Oct 4th TAG telcon, the TAG spent approximately 30 minutes looking through
the document but were unable to glean answers to our questions. We
believe that the measurements document is too detailed and academic to
prove the need for any flavours of binary XML or the selection of a
particular technology, and that it is likely that only a handful of people
will be able to grasp even the data embodied in the document.
The document states "Based on the measurements described here, the working
group has selected Efficient XML ([EffXML]) to be the basis for the
proposed encoding specification to be prepared as a candidate W3C
Recommendation. " There is no high level supporting evidence for such a
conclusion.
It appears that the working group has presupposed a conclusion that binary
xml is necessary and no concerns by the TAG or member companies on the
technology and methodologies themselves or the integration with the rest
of XML (including DSig) need to be addressed directly. Some examples:
- A main author of Efficient XML did press outreach [3] that resulted in
an article called "W3C is working on a solution to bandwidth-hogging,
clunky XML".
- IBM's withdrawal [4] from the Working Group shows their detailed and
specific disagreement with the benefits of EXI and remains unaddressed.
Just recently a working group member said "The same goes for IBM, if they
really thought the result tells something important, they would have spent
some more time establishing a case out the result against EXI. "[5]
- the evolution of "the XML stack" including DSig and Encryption is mostly
ignored [6]. For example, of all the use cases that EffXML solves, what
cases require a revision of DSig and/or Encryption before deployment? It
may very well be the case that similar to the failure of the incompatible
evolution of XML 1.1 will be duplicated in the an incompatible evolution
of EXI without supporting evolution of technologies.
- the TAG concerns were never directly addressed.
We will go through our documented longstanding concerns in order.
"The drawbacks are likely to include reduced interoperability with XML
1.0 and XML 1.1 software, and an inability to leverage the benefits of
text-based formats. "
The Working Group has rejected the use of an encoding in the XML
Declaration because of the 18 or so characters. We fail to understand why
this is an issue. XML 1.0 allows different encodings and it allows
default encodings to be defined out of band. Constrained environments
with out of band knowlege of binary encoding could assume binary encoding
on incoming messages, failing over to text xml. Less tightly coupled
environments, such as the Web, could use the XML declaration encoding.
This at least would preserve part of the XML stack and technologies.
Continuing further, "In particular, we suggest that a quantitative
analysis is necessary.
For at least a few key use cases, concrete targets should be set for the
size and/or speed gains that would be needed to justify the disruption
introduced by a new format. For example, a target might be that "in
typical web services scenarios, median speed gains on the order of 3x in
combined parsing and deserialization are deemed sufficient to justify a
new format." We further suggest that representative binary technologies
be benchmarked and analyzed to a sufficient degree that such speed or
size improvements can be reasonably reliably predicted before we commit
to a Recommendation. No doubt, any given set of goals or benchmarks
will suffer from some degree of imprecision, but if the gains are
sufficiently compelling to justify a new format, then they should be
relatively easy to demonstrate. In short, actual measurements should be
a prerequisite to preparing a Recommendation."
The Working Group did not establish concrete targets for size and/or speed
gains that would justify disruption prior to running measurements. Actual
measurements have been done but the absence of clear targets for size
and/or speed gains even after the measurements means that the the
selection of any technology appears unjustified.
Continuing in the TAG message, "In doing such measurements, we believe it
is essential that comparisons
be done to the best possible text-based XML 1.x implementations, which
are not necessarily those that are most widely deployed. Stated
differently:
if XML 1.x is inherently capable of meeting the needs of users, then our
efforts should go into tuning our XML implementations, not designing new
formats. Benchmark environments should be as representative as possible
of fully optimized implementations, not just of the XML parser, but of
the surrounding application or middleware stack. We note that different
application-level optimizations may be necessary to maximize the
performance of the Binary or text cases respectively. Care should
especially be taken to ensure that the performance of particular APIs
such as DOM or SAX does not obscure the performance possible with either
option (e.g. both SAX and DOM can easily result in high overhead string
conversions when UTF-8 is used.)"
The Working Groups call for implementations [7] specifically called for
only XML parsers, not of the surround application or middleware stacks,
JDKs or Java Virtual Machines. The benchmarks have not been against the
best possible text-based XML 1.x implementations.
The measurements document acknowleges the issue of stack integration in
"Stack integration considers the full XML processing system, not just the
parser. By selectively combining the components of the processing stack
through abstract APIs, the system can directly produce application data
from the bytes that were read. Two prominent examples of this technique
are [Screamer] and [EngelenGSOAP]. Both of these can also be called
schema-derived as they compile a schema into code. However, neither simply
generates a generic parser, but rather a full stack for converting between
application data and serialized XML. This gives a significant improvement
compared to just applying the pre-compilation to the parsing layer. " But
neither of these prominent examples appears in the test data.
Further, there were no "real-world end to end" use cases tested, such as a
Web service application, a mobile application, etc. Thus we do not know
the overall effect of any particular technology on the overall application
performance.
The measurements document states "To begin with, the XBC Characterization
Measurement Methodologies Note defines thresholds for whether a candidate
format achieves sufficient compactness " [8]. The XBC Characterization
Measurement Methodologies Note[8] states "Because XML documents exist with
a wide variety of sizes, structures, schemas, and regularity, it is not
possible to define a single size threshold or percentage compactness that
an XML format must achieve to be considered sufficiently compact for a
general purpose W3C standard. "
We attempted to determine the differences between Efficient XML and Gzip
but found the methodology confusing. The measurements document specifies
that "In the Document and Both classes, candidates are compared against
gzipped XML, while in the Neither and Schema cases, the comparison was to
plain XML". Examining Document and Both compactness graphs, Gzip appears
to offer improvements over XML that track the other implementations, with
the noteworthy point that Efficient XML's improvements over Gzip are
significant in a significant part of the Both chart but similar in the
Document. Examining Processing Efficiency graphs, it appears as though
XML is clearly superior in Java Encoding in Document and Both. GZip
appears further inferior but yet all solutions vary wildly around XML in
Decoding Document and Both. A worrying statement is "An interesting point
to note in the decoding results is the shapes of the graphs for each
individual candidate. Namely, these appear similar to each other,
containing similar peaks and troughs. Even more interestingly, this is
also the case with Xals, indicating that there is some feature of the JAXP
parser that is implemented suboptimally and triggered by a subset of the
test documents." The measurements document states "For instance,
preliminary measurements in the EXI framework indicate that the default
parser shipped with Java improved noticeably from version 5.0 to version
6, showing 2-3-fold improvement for some cases. ", and the measurements
used JDK 1.5.0_05-b05 for Java based parsing and JDK 1.6.0_02-ea-b01 for
native. Perhaps an improved JDK, Java Virtual Machine, or virtualized
JVM would further improve results.
These leads us to wonder whether a combination GZip with improved
technologies such as Parsers, JDKs, VMs, or even Stack Integration
technology (that is Schema aware and hence covered under Both and Schema)
would suffice for the community.
Examining the data sets used, there are a number of military applications
(ASMTF, AVCL, JTLM) and yet comparatively few generic "Web service"
applications. The Google test suite lists Web services for small devices
and Web services routing; the Invoice test suite lists Intra/InterBusiness
Communication which immediately limits it's scope to "A large business
communicates via XML with a number of remote businesses, some of which can
be small business partners. These remote or small businesses often have
access only to slow transmission lines and have limited hardware and
technical expertise. "; and there is a WSDL test suite. This seems to
avoid the "common" Web service case of the many Web APIs provided by
hosted solutions like Google, Yahoo, Amazon, eBay, Salesforce, Facebook,
MSN, etc. Examining the test data shows that the Google test cases used
5 different test cases (0,7,15, 24,30) which includes 1 soap fault (case
#24). There are 2 AVCL, 5 Invoice, 8 Location Sightings, 6 JTLM, 5 ASMTF,
2 WSDL test cases as well. There appears to be broad based coverage of
each, though the rationale for the various weightings aren't documented.
For example, why 4 Google "success cases" and 2 WSDL cases? Surely there
are more than 2 times as many SOAP messages than WSDL messages being sent
around the internet.
In conclusion, W3C TAG's concerns have not been addressed and the W3C TAG
does not support an incompatible change to the XML Stack.
Cheers,
Dave
[1] http://lists.w3.org/Archives/Public/www-tag/2005May/0044
[2] http://www.w3.org/TR/2007/WD-exi-measurements-20070725/
[3] http://www.sdtimes.com/article/latestnews-20071015-10.html
[4] http://lists.w3.org/Archives/Member/member-exi-wg/2007Mar/0014.html
[5] http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/0010.html
[6]
http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/att-0005/00-part
[7] http://lists.w3.org/Archives/Public/public-exi/2006Mar/0004.html
[8] http://www.w3.org/TR/xbc-measurement/
Received on Saturday, 27 October 2007 14:44:21 UTC