- From: Chris Lilley <chris@w3.org>
- Date: Fri, 18 Mar 2005 15:12:41 +0100
- To: noah_mendelsohn@us.ibm.com
- Cc: Robin Berjon <robin.berjon@expway.fr>, www-tag@w3.org
On Thursday, March 17, 2005, 8:46:45 PM, noah wrote: nuic> Robin Berjon writes: >> Some of the things people were spending time on >> were XML-related. For example UTF-8 to UTF-16 >> conversion nuic> I don't think I buy this as a rationale for a binary XML standard. It wasn't provided as one. It was provided as a caution; when benchmarking the improvement from a binary XML standard, be sure you are measuring the effect on total system throughput rather than the effect on some small part of it, because it might affect other parts. nuic> XML is text, often UTF-8. As an industry we went and cooked up nuic> APIs that pass around all the strings as UTF-16, which to be fair nuic> is common on many platforms. Not surprisingly, there are nuic> conversion overheads, and I agree they are very significant. nuic> Why does this problem justify a binary XML standard? It doesn't. However, a binary XML standard might alleviate that aspect as well, so that aspect should be measured as well. nuic> We've done some work in this area in IBM. I am not at all nuic> convinced that the answer to platforms and API's that are bad at nuic> manipulating UTF-8 is to define a binary XML. (That misrepresents what I said. Its a strawman. I think your argument can stand on its merits without having to drag in 'force the industry' or 'bad at' characterisations). >> or assigning data types with schema to make a >> PSVI. If a binary format already has the PSVI >> information nuic> I think you need to be very careful heading down this path, nuic> depending on your use case. The term PSVI in particular relates to nuic> schema validation. In many cases the reason you are doing schema nuic> validation is because you don't entirely trust the source of the nuic> data. Once you're doing other aspects of validation to check the nuic> data, I would claim (having built such systems) that type nuic> assignment is nearly free in many cases. I would claim that whether this is 'nearly free' needs to be measured under controlled conditions, not merely asserted. nuic> Maybe what you're hinting is that for an integer you're going to send the nuic> binary "int" and not the character string. If so, then that's not XML in nuic> a deeper sense, and the fact that you know the "PSVI" type is incidental nuic> to the fact that you've moved from characters to abstract numbers. With nuic> the binary "int", you can't distinguish "123" from "00123", and that's a nuic> huge difference. That depends on whether its a desired result or an undesired flaw, which depends on the application. One of the problems with DOM Level 2, for example, is that it makes you keep around a whole bunch of stuff that many times you really don't care about: foo="12" foo=" 12" foo=" 12 " foo="12 " foo="00000012" One of the nice things about DOM Level 3 is that it allows normalization. The normalized result can be returned. Often this is what is wanted; and when it is, sending the normalized value has benefits. If you know that foo holds an integer then returning the integer 12 rather than one of several lexical representations is a win. nuic> For example, an XML DSIG over the two would be nuic> different. In any case, now you're into sending something closer to a nuic> subset of the XPath 2.0 XQuery data model than optimized XML. Another way of stating that is that 'binary XML' might not be 'just a different way of sending XML'. Yes, it might be a binary XQ data model. In fact, that specific example eas mentioned several times at the workshop. It might optimize more steps that just the first parsing step. That would be a feature :) Thus, again, total throughput would be the thing to measure when evaluating the effect of BinaryXML. nuic> An interesting thing to consider, but it has all sorts of deep nuic> implications. SOAP, in particular, uses infosets. In a SOAP nuic> message, "123" is different from "00123", even if the schema or nuic> xsi:type claims you've got an integer. DSIGs on the two will be nuic> different. Which is fine, applications that need to care about the lexical form could continue to shuttle around strings. I assume BinaryXML will still be able to handle strings. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group W3C Graphics Activity Lead
Received on Friday, 18 March 2005 14:12:42 UTC