W3C home > Mailing lists > Public > www-tag@w3.org > March 2005

Re: Binary XML (was: Re: Draft minutes of 15 March 2005 Telcon)

From: Chris Lilley <chris@w3.org>
Date: Fri, 18 Mar 2005 15:12:41 +0100
Message-ID: <64310570.20050318151241@w3.org>
To: noah_mendelsohn@us.ibm.com
Cc: Robin Berjon <robin.berjon@expway.fr>, www-tag@w3.org

On Thursday, March 17, 2005, 8:46:45 PM, noah wrote:

nuic> Robin Berjon writes:

>> Some of the things people were spending time on
>> were XML-related. For example UTF-8 to UTF-16
>> conversion

nuic> I don't think I buy this as a rationale for a binary XML standard.

It wasn't provided as one. It was provided as a caution; when
benchmarking the improvement from a binary XML standard, be sure you are
measuring the effect on total system throughput rather than the effect
on some small part of it, because it might affect other parts.

nuic> XML is text, often UTF-8. As an industry we went and cooked up
nuic> APIs that pass around all the strings as UTF-16, which to be fair
nuic> is common on many platforms. Not surprisingly, there are
nuic> conversion overheads, and I agree they are very significant.

nuic> Why does this problem justify a binary XML standard?

It doesn't. However, a binary XML standard might alleviate that aspect
as well, so that aspect should be measured as well.

nuic> We've done some work in this area in IBM. I am not at all
nuic> convinced that the answer to platforms and API's that are bad at
nuic> manipulating UTF-8 is to define a binary XML.

(That misrepresents what I said. Its a strawman. I think your argument
can stand on its merits without having to drag in 'force the industry'
or 'bad at' characterisations).

>> or assigning data types with schema to make a
>> PSVI. If a binary format already has the PSVI
>> information

nuic> I think you need to be very careful heading down this path,
nuic> depending on your use case. The term PSVI in particular relates to
nuic> schema validation. In many cases the reason you are doing schema
nuic> validation is because you don't entirely trust the source of the
nuic> data. Once you're doing other aspects of validation to check the
nuic> data, I would claim (having built such systems) that type
nuic> assignment is nearly free in many cases.

I would claim that whether this is 'nearly free' needs to be measured
under controlled conditions, not merely asserted.

nuic> Maybe what you're hinting is that for an integer you're going to send the
nuic> binary "int" and not the character string.  If so,  then that's not XML in
nuic> a deeper sense, and the fact that you know the "PSVI" type is incidental
nuic> to the fact that you've moved from characters to abstract numbers.   With
nuic> the binary "int", you can't distinguish "123" from "00123", and that's a
nuic> huge difference.

That depends on whether its a desired result or an undesired flaw, which
depends on the application. One of the problems with DOM Level 2, for
example, is that it makes you keep around a whole bunch of stuff that
many times you really don't care about:

foo="   12"
  12 "
foo="&#x31;&#x32;   "

One of the nice things about DOM Level 3 is that it allows
normalization. The normalized result can be returned. Often this is what
is wanted; and when it is, sending the normalized value has benefits.

If you know that foo holds an integer then returning the integer 12
rather than one of several lexical representations is a win.

nuic>  For example, an XML DSIG over the two would be 
nuic> different.  In any case, now you're into sending something closer to a
nuic> subset of the XPath 2.0 XQuery data model than optimized XML.

Another way of stating that is that 'binary XML' might not be 'just a
different way of sending XML'. Yes, it might be a binary XQ data model.
In fact, that specific example eas mentioned several times at the
workshop. It might optimize more steps that just the first parsing step.
That would be a feature :)

Thus, again, total throughput would be the thing to measure when
evaluating the effect of BinaryXML.

nuic> An interesting thing to consider, but it has all sorts of deep
nuic> implications. SOAP, in particular, uses infosets. In a SOAP
nuic> message, "123" is different from "00123", even if the schema or
nuic> xsi:type claims you've got an integer. DSIGs on the two will be
nuic> different.

Which is fine, applications that need to care about the lexical form
could continue to shuttle around strings. I assume BinaryXML will still
be able to handle strings.

 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
Received on Friday, 18 March 2005 14:12:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:33 GMT