- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 17 Mar 2005 14:46:45 -0500
- To: Chris Lilley <chris@w3.org>
- Cc: Robin Berjon <robin.berjon@expway.fr>, www-tag@w3.org
Robin Berjon writes: > Some of the things people were spending time on > were XML-related. For example UTF-8 to UTF-16 > conversion I don't think I buy this as a rationale for a binary XML standard. The line of reasoning I see in the above is: XML is text, often UTF-8. As an industry we went and cooked up APIs that pass around all the strings as UTF-16, which to be fair is common on many platforms. Not surprisingly, there are conversion overheads, and I agree they are very significant. Why does this problem justify a binary XML standard? Instead of making the platform or the API more efficient at dealing with UTF-8, which seems like a good investment on that platform, we're going to force the whole industry to accept interchange of a new form of XML? Maybe or maybe not that binary form's representations of strings will go into your API with lower conversion overhead, but I do note that Java in particular uses UTF-16 under the covers, and you can if you wish use UTF-16 for XML today. We've done some work in this area in IBM. I am not at all convinced that the answer to platforms and API's that are bad at manipulating UTF-8 is to define a binary XML. There's a lot you can do to avoid character conversions of you're careful and your API is suitably designed. Indeed, it seems to me that things are just dandy in XML for use with platforms that do UTF-8 efficiently. Will the binary form be faster or slower for them? > or assigning data types with schema to make a > PSVI. If a binary format already has the PSVI > information I think you need to be very careful heading down this path, depending on your use case. The term PSVI in particular relates to schema validation. In many cases the reason you are doing schema validation is because you don't entirely trust the source of the data. Once you're doing other aspects of validation to check the data, I would claim (having built such systems) that type assignment is nearly free in many cases. The same is true for many deserialization use cases, even where you don't use xml schema for validation: if you know you're deserializing a "quantity" field then the deserializer very often has static knowledge that it's an int. I don't see why there's overhead for that in the common use cases. Maybe what you're hinting is that for an integer you're going to send the binary "int" and not the character string. If so, then that's not XML in a deeper sense, and the fact that you know the "PSVI" type is incidental to the fact that you've moved from characters to abstract numbers. With the binary "int", you can't distinguish "123" from "00123", and that's a huge difference. For example, an XML DSIG over the two would be different. In any case, now you're into sending something closer to a subset of the XPath 2.0 XQuery data model than optimized XML. An interesting thing to consider, but it has all sorts of deep implications. SOAP, in particular, uses infosets. In a SOAP message, "123" is different from "00123", even if the schema or xsi:type claims you've got an integer. DSIGs on the two will be different. -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- Chris Lilley <chris@w3.org> 03/17/05 12:31 PM Please respond to Chris Lilley To: Robin Berjon <robin.berjon@expway.fr> cc: noah_mendelsohn@us.ibm.com, www-tag@w3.org Subject: Re: Draft minutes of 15 March 2005 Telcon On Wednesday, March 16, 2005, 7:54:21 PM, Robin wrote: RB> noah_mendelsohn@us.ibm.com wrote: >> DO: I thought that one of the interesting presentations at the workshop >> from Sun analyzed not just message size (and thus network overhead) but >> also what was happening in the processor. >> ... A lot of time was spent in the binding frameworks. >> ... Even if you came along and doubled the network performance by >> halving the size, you might get only 1/3 of improvement RB> Yes, if you're doing a lot of other things that aren't XML, then RB> speeding up XML won't help. But when you're rendering an SVG document RB> and the vast majority of your time is spent waiting for the network and RB> parsing the XML, then you know there's going to be speedup. Some of the things people were spending time on were XML-related. For example UTF-8 to UTF-16 conversion (to create a DOM) or assigning data types with schema to make a PSVI. If a binary format already has the PSVI information and speeds up the production of a DOM (or obviates the need to construct a separate data structure to implement the DOM APIs eficiently, might be a better way of putting it) that would result in a significant speedup. It might not be measured in x times smaller or x times faster to parse, though. But it would show up in transactions-per-second measurements. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group W3C Graphics Activity Lead
Received on Thursday, 17 March 2005 19:47:18 UTC