- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 24 Oct 2003 16:32:56 -0400
- To: "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>
- Cc: www-tag@w3.org
Reading this thread, I'd be curious to hear from some of the original authors of XML: to what degree did you believe you were only establishing syntax and to what degree a model + syntax (not necessarily the infoset in particular, I mean a model implicit in the XML Rec.) Here's what I mean. If you're only establishing syntax, then you are (I think) merely indicating that the following forms are legal: <e></e> <e/> <e a="1"></e> <e a='1'></e> <e a="1"/> <e a='1'/> while specifying that the following is not: <a></b> Independent of the Infoset or XPath data models, I think the XML Recommendation supports the inference that the following groups are to be distinguished in the legal forms above: Group 1 (Data model: an empty element named "e"): <e></e> <e/> Group 2 (Data model: an empty element "e" with attribute "a" of value "1'): <e a="1"></e> <e a='1'></e> <e a="1"/> <e a='1'/> Thus, group 1 is part of an equivalence class, as is group 2, but those are separate classes. What licenses this interpretation? Well, I claim that it's because there is a data model implied by XML. That model says: there is a document at the root, with a single root element associated with that document. Each element has zero or more attributes, with the choice of single or double quotes on attributes typically being uninteresting for many purposes, and so on. So, I claim that at least some sort of model is implied in the XML Recommendation itself. A question is: are we better off on balance having set down that model clearly, and layered it separately from the syntax? I think so. As others have observed, one of the reasons the model is there is because you want to talk about what is significant for processing: for many purposes, it's the underlying elements and attributes that are significant in the example above, not the choice of single or double quotes. To make this easy to talk about, we can set down just the information that's significant in a model such as the Infoset. I think this is unambiguously a good step to have taken, modulo some discomfort regarding the lack of integration between the XML, Namespaces, and Infoset Rec documents. Indeed, l would (oh heresy) have preferred to see the data model terminology introduced first, which would then allow you to say in the XML Rec: "an attribute information item is serialized in the form a="1" or a='1' " I also think it's a real mess that we are now up to three overlapping data models, plus the one implied by XML itself. We've got XPath 1.0 data model used by XSL 1.0 and the standard c14n's; Infoset used by Schema, SOAP etc.; and the new XQuery model that captures the value-space to lexical-space associations, needed to support operations such as bounds checks on the types like integer that were introduced with Schema. In principle, it would be nice to have one scaleable model, rather than three that overlap to such a significant degree, IMO. Still, I think the need for model(s) is compelling. As others have argued, part of the compelling value of XML is that there is a single serialization syntax that covers most use cases. Indeed, XML 1.0 syntax should be used wherever practical. Just because the data models are well specified does not mean we need to standardize or encourage use of multiple serial forms. Such alternate representations should be used only when the use cases are compelling, or internally to particular optimized implementations. As I've said in the debates on binary XML, I think the bar should be set very, very high in justifying the standardization of any such alternate serial forms. On the other hand, I think it's clear that in memory, for the on-disk structures of an XML database, or perhaps even on very slow links or small memories, there are good reasons to optimize the representation of XML. In defining such representations, it's useful to know that the even the XML Rec suggests that the differences between a='1' and a="1" may not be significant. The Data Model Recommendation(s) capture that. So, I think that some form of explicit model is important, indeed necessary. We must then avoid the temptation to use the existence of a data model as an excuse for a proliferation of non-standard or even standardized serial syntaxes. I think we can avoid those temptations while still benefiting from clear documentation of the models that I believe have been implicit in XML from day 1 anyway. ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------
Received on Friday, 24 October 2003 16:35:23 UTC