- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 26 Oct 2007 18:28:40 -0400
- To: "David Orchard" <dorchard@bea.com>
- Cc: "W3C-TAG Group WG" <www-tag@w3.org>, shh@us.ibm.com, haggar@us.ibm.com, klawrenc@us.ibm.com
Dave: Speaking as a TAG member: Thank you for doing this. I think there's a lot of good stuff here. My overall impression, FWIW, is that this would have more impact if we went into just a bit less detail, and focussed a slightly shorter note on some key points. Don't take this is a significant criticism. I think the core of this is very good, just that a bit of tightening would make it more effective. If that means a few details fall out of it, I'm not sure that's all bad. Speaking as an IBMer: > - IBM's withdrawal [4] from the Working Group shows their detailed and specific disagreement with the benefits of EXI and remains unaddressed. Thank you for referencing this note; we're very glad that our concerns are viewed as significant for this discussion. I do note that, like several other references from your note, the link is to member-confidential email. I doubt that anyone in IBM would object to having a copy posted in a public archive, and if you'd like I could in principle check with my IBM colleagues. I say in principle because our note also refers to some correspondence among a smaller group of individuals in the W3C. It was originally written as input to the chairs and W3C staff, and I think we would need the permission of this individuals as well. Shall we try and ask all these folks whether they're OK with at least IBM's note being posted in a public place? If not, I think it will be frustrating for your readers to find the link and not be able to follow it. > Just recently a working group member said "The same goes for > IBM, if they really thought the result tells something > important, they would have spent some more time establishing a > case out the result against EXI. "[5] Again, this is a member only link. I think you've taken a small enough bit that it's sort of out of context. The quote appears to imply that surely IBM did not take its concerns about EXI seriously, as we would otherwise have "spent more time establishing the case...against EXI". In fact, our work on XML performance was done over several years, and just the comparison work we did on EXI involved several weeks of work in particular. Those comparisons were presented in a quite detailed presentation to the EXI workgroup, and I happened to be there as an observer that day. I think it's only fair if you are going to have a quote like this that you ensure that the entire email at [5] is publicly accessible so that people can draw their own conclusions about it. I think it's fair to say that IBM believes that EXI offers interesting compression on XML, and some speed gains in many use cases; we also think that the speed gains over well optimized text implementations are not nearly as great as might be inferred from the measurements presented by the EXI group. The issues include many that you've put into your note, Dave (choice and weighting of test cases, use of Java, etc.) More to the point, I think you're trying to make the case that the EXI workgroup didn't take our concerns seriously (I offer no opinion on that -- while our concerns have not been satisfied, I wouldn't want to accuse anyone in the EXI group of not taking them seriously). In fact, the quote more directly seems to imply that we in IBM did not take our own concerns seriously, and that I must object to (though the author is surely entitled to express that opinion). Maybe it would be better to just point out that we were among those who raised concerns, possibly linking our note if we get it made public, and to indicate that we in IBM do not believe the performance analysis in the EXI drafts addresses our concerns. If you want to additionally say that the authors of [5] think we did not take our own concerns seriously enough to give them force, of course you may do so, but I think we'd want a chance to breifly and politely rebut that. Sorry to make a fuss about this, but there's already a lot of heat and confusion around this whole issue, and I'm afraid that the current fragmented quotes from member-only emails will make it worse and not better. Noah [4] http://lists.w3.org/Archives/Member/member-exi-wg/2007Mar/0014.html [5] http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/0010.html -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "David Orchard" <dorchard@bea.com> Sent by: www-tag-request@w3.org 10/25/2007 01:12 AM To: "W3C-TAG Group WG" <www-tag@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: Review of EXI Draft TAG response to EXI. I believe there could be significant negative response so I encourage the TAG members to suggest modifications as they see fit. The TAG does not consider it's concerns raised in May 2005[1] to be addressed. We are unable to ascertain whether our concerns have been addressed or not from the measurements document[2] produced by the working group. It appears that the Working Group believes that non-Working Group members must prove the WG wrong in it's decisions on EXI technologies, rather than convincing the world that there is indeed a conclusive case. The measurements document is full of 127 pages of data and analysis but we did not see any material in a summary or abstract that addressed our concerns. In the almost 3 months since publication of the document on July 25th 2007, the EXI working group has had 28 messages on their mailing list, none of which are comments on the measurements document. During the Oct 4th TAG telcon, the TAG spent approximately 30 minutes looking through the document but were unable to glean answers to our questions. We believe that the measurements document is too detailed and academic to prove the need for any flavours of binary XML or the selection of a particular technology, and that it is likely that only a handful of people will be able to grasp even the data embodied in the document. The document states "Based on the measurements described here, the working group has selected Efficient XML ([EffXML]) to be the basis for the proposed encoding specification to be prepared as a candidate W3C Recommendation. " There is no high level supporting evidence for such a conclusion. It appears that the working group has presupposed a conclusion that binary xml is necessary and no concerns by the TAG or member companies on the technology and methodologies themselves or the integration with the rest of XML (including DSig) need to be addressed directly. Some examples: - A main author of Efficient XML did press outreach [3] that resulted in an article called "W3C is working on a solution to bandwidth-hogging, clunky XML". - IBM's withdrawal [4] from the Working Group shows their detailed and specific disagreement with the benefits of EXI and remains unaddressed. Just recently a working group member said "The same goes for IBM, if they really thought the result tells something important, they would have spent some more time establishing a case out the result against EXI. "[5] - the evolution of "the XML stack" including DSig and Encryption is mostly ignored [6]. For example, of all the use cases that EffXML solves, what cases require a revision of DSig and/or Encryption before deployment? It may very well be the case that similar to the failure of the incompatible evolution of XML 1.1 will be duplicated in the an incompatible evolution of EXI without supporting evolution of technologies. - the TAG concerns were never directly addressed. We will go through our documented longstanding concerns in order. "The drawbacks are likely to include reduced interoperability with XML 1.0 and XML 1.1 software, and an inability to leverage the benefits of text-based formats. " The Working Group has rejected the use of an encoding in the XML Declaration because of the 18 or so characters. We fail to understand why this is an issue. XML 1.0 allows different encodings and it allows default encodings to be defined out of band. Constrained environments with out of band knowlege of binary encoding could assume binary encoding on incoming messages, failing over to text xml. Less tightly coupled environments, such as the Web, could use the XML declaration encoding. This at least would preserve part of the XML stack and technologies. Continuing further, "In particular, we suggest that a quantitative analysis is necessary. For at least a few key use cases, concrete targets should be set for the size and/or speed gains that would be needed to justify the disruption introduced by a new format. For example, a target might be that "in typical web services scenarios, median speed gains on the order of 3x in combined parsing and deserialization are deemed sufficient to justify a new format." We further suggest that representative binary technologies be benchmarked and analyzed to a sufficient degree that such speed or size improvements can be reasonably reliably predicted before we commit to a Recommendation. No doubt, any given set of goals or benchmarks will suffer from some degree of imprecision, but if the gains are sufficiently compelling to justify a new format, then they should be relatively easy to demonstrate. In short, actual measurements should be a prerequisite to preparing a Recommendation." The Working Group did not establish concrete targets for size and/or speed gains that would justify disruption prior to running measurements. Actual measurements have been done but the absence of clear targets for size and/or speed gains even after the measurements means that the the selection of any technology appears unjustified. Continuing in the TAG message, "In doing such measurements, we believe it is essential that comparisons be done to the best possible text-based XML 1.x implementations, which are not necessarily those that are most widely deployed. Stated differently: if XML 1.x is inherently capable of meeting the needs of users, then our efforts should go into tuning our XML implementations, not designing new formats. Benchmark environments should be as representative as possible of fully optimized implementations, not just of the XML parser, but of the surrounding application or middleware stack. We note that different application-level optimizations may be necessary to maximize the performance of the Binary or text cases respectively. Care should especially be taken to ensure that the performance of particular APIs such as DOM or SAX does not obscure the performance possible with either option (e.g. both SAX and DOM can easily result in high overhead string conversions when UTF-8 is used.)" The Working Groups call for implementations [7] specifically called for only XML parsers, not of the surround application or middleware stacks, JDKs or Java Virtual Machines. The benchmarks have not been against the best possible text-based XML 1.x implementations. The measurements document acknowleges the issue of stack integration in "Stack integration considers the full XML processing system, not just the parser. By selectively combining the components of the processing stack through abstract APIs, the system can directly produce application data from the bytes that were read. Two prominent examples of this technique are [Screamer] and [EngelenGSOAP]. Both of these can also be called schema-derived as they compile a schema into code. However, neither simply generates a generic parser, but rather a full stack for converting between application data and serialized XML. This gives a significant improvement compared to just applying the pre-compilation to the parsing layer. " But neither of these prominent examples appears in the test data. Further, there were no "real-world end to end" use cases tested, such as a Web service application, a mobile application, etc. Thus we do not know the overall effect of any particular technology on the overall application performance. The measurements document states "To begin with, the XBC Characterization Measurement Methodologies Note defines thresholds for whether a candidate format achieves sufficient compactness " [8]. The XBC Characterization Measurement Methodologies Note[8] states "Because XML documents exist with a wide variety of sizes, structures, schemas, and regularity, it is not possible to define a single size threshold or percentage compactness that an XML format must achieve to be considered sufficiently compact for a general purpose W3C standard. " We attempted to determine the differences between Efficient XML and Gzip but found the methodology confusing. The measurements document specifies that "In the Document and Both classes, candidates are compared against gzipped XML, while in the Neither and Schema cases, the comparison was to plain XML". Examining Document and Both compactness graphs, Gzip appears to offer improvements over XML that track the other implementations, with the noteworthy point that Efficient XML's improvements over Gzip are significant in a significant part of the Both chart but similar in the Document. Examining Processing Efficiency graphs, it appears as though XML is clearly superior in Java Encoding in Document and Both. GZip appears further inferior but yet all solutions vary wildly around XML in Decoding Document and Both. A worrying statement is "An interesting point to note in the decoding results is the shapes of the graphs for each individual candidate. Namely, these appear similar to each other, containing similar peaks and troughs. Even more interestingly, this is also the case with Xals, indicating that there is some feature of the JAXP parser that is implemented suboptimally and triggered by a subset of the test documents." The measurements document states "For instance, preliminary measurements in the EXI framework indicate that the default parser shipped with Java improved noticeably from version 5.0 to version 6, showing 2-3-fold improvement for some cases. ", and the measurements used JDK 1.5.0_05-b05 for Java based parsing and JDK 1.6.0_02-ea-b01 for native. Perhaps an improved JDK, Java Virtual Machine, or virtualized JVM would further improve results. These leads us to wonder whether a combination GZip with improved technologies such as Parsers, JDKs, VMs, or even Stack Integration technology (that is Schema aware and hence covered under Both and Schema) would suffice for the community. Examining the data sets used, there are a number of military applications (ASMTF, AVCL, JTLM) and yet comparatively few generic "Web service" applications. The Google test suite lists Web services for small devices and Web services routing; the Invoice test suite lists Intra/InterBusiness Communication which immediately limits it's scope to "A large business communicates via XML with a number of remote businesses, some of which can be small business partners. These remote or small businesses often have access only to slow transmission lines and have limited hardware and technical expertise. "; and there is a WSDL test suite. This seems to avoid the "common" Web service case of the many Web APIs provided by hosted solutions like Google, Yahoo, Amazon, eBay, Salesforce, Facebook, MSN, etc. Examining the test data shows that the Google test cases used 5 different test cases (0,7,15, 24,30) which includes 1 soap fault (case #24). There are 2 AVCL, 5 Invoice, 8 Location Sightings, 6 JTLM, 5 ASMTF, 2 WSDL test cases as well. There appears to be broad based coverage of each, though the rationale for the various weightings aren't documented. For example, why 4 Google "success cases" and 2 WSDL cases? Surely there are more than 2 times as many SOAP messages than WSDL messages being sent around the internet. In conclusion, W3C TAG's concerns have not been addressed and the W3C TAG does not support an incompatible change to the XML Stack. Cheers, Dave [1] http://lists.w3.org/Archives/Public/www-tag/2005May/0044 [2] http://www.w3.org/TR/2007/WD-exi-measurements-20070725/ [3] http://www.sdtimes.com/article/latestnews-20071015-10.html [4] http://lists.w3.org/Archives/Member/member-exi-wg/2007Mar/0014.html [5] http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/0010.html [6] http://lists.w3.org/Archives/Member/member-exi-wg/2007Sep/att-0005/00-part [7] http://lists.w3.org/Archives/Public/public-exi/2006Mar/0004.html [8] http://www.w3.org/TR/xbc-measurement/
Received on Saturday, 27 October 2007 14:44:21 UTC