- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 5 May 2003 16:54:33 -0400
- To: "John J. Barton" <John_Barton@hpl.hp.com>
- Cc: xml-dist-app@w3.org
John Barton asks: > However, there is one important issue in your note > below that I'd like to understand better. As I read > the infoset docs, the bit representation of components > of the infoset are *not* defined. That is, infoset as > I understand it is silent on binary vs base64 vs plain > text or whatever. So Infoset is about the data > structure not the data representation. Is the correct? Well, let's be careful not to confuse what is modeled from how it is represented. The Infoset IS very definitely about characters. There is absolutely no question in at the Infoset that the following are different: <e>123</e> <e>00123</e> The content of the first is three character children information items, the second has five children. In fact, even the following are different in the Infoset (though not necessarily in the new XQuery/XPath data model): <f xsi:type="xsd:integer">123</f> <f xsi:type="xsd:integer">00123</f> What's confusing you is that in another sense you are absolutely correct: the infoset does not tell you how to represent those characters. Indeed, the very "trick" at the heart of PASWA is that one way to optimize the first of each of these pairs is to make a note that the characters are what the schema recommendation calls the canonical lexical representation (no leading zeros...you can use a single bit isCanonical/isNotCanical to signal that the optimization has triggere), and then to store the actual integer 123 for the value. Note that the infoset is still unambiguously characters. If you are asked for the content of that first element e you must come up with the three characters "1", "2", "3". The trick in PASWA is that, in most cases, the application won't actually ask for the infoset content, but for something derived from it (I.e. the actual binary value.) Note that I've used integers for ease of illustration; PASWA focuses primarily on the base64 type. ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------
Received on Monday, 5 May 2003 17:03:36 UTC