- From: Paul Prescod <paul@prescod.net>
- Date: Fri, 20 Aug 1999 11:56:49 -0400
- To: w3c-xml-schema-ig@w3.org, www-xml-schema-comments@w3.org, xml-dev <xml-dev@ic.ac.uk>
This proposal goes one step beyond the one in "&-compromise". It goes a step farther towards making XML behave as a property/value language like most "OO" languages. One of the hardest questions to answer about XML DTD design is: "element or attribute." Attributes have certain virtues relating to lexical typing and convenience and elements have the primary virtue of being able to contain sub-structure. XML schemas are set to reduce some of the benefits of attributes but not all. Attributes will continue to be more convenient because they can come in any order and because they are easier to type. The problem is: if you choose to represent something as an attribute, for user convenience and "intuitiveness," you can never change your mind and allow it to have substructure later. I propose a new concept called a "structured attribute" (attr+). An attr+ is a property of an element. In most respects it is just like an attribute. In fact, not much in the schema spec would change at all. Am attr+ is different in that it can be syntactically fulfilled in a document by EITHER an XML 1.0 attribute OR a sub-element element of the same name. All such elements must precede the "real content" of an element. All characteristics must have names that are different from any element type name allowable in the element. A processor knows it has shifted from processing characteristics to processing "content elements" merely by looking at the generic identifiers ("tag names", for DOM-heads). Let me be clear that there is no new syntax in the document instance. A characteristic "foo" of type "IDREF" on element "bar" can be expressed as: <bar foo="abc">...</bar> or <bar><foo>abc</foo>...</bar> Attr+es can have full content models. If a particular instance of a characteristic on a particular element happens to be all text (no sub-elements) then it can be expressed as an attribute instead of as an element. Over time, the word "attribute" would come to be synonymous with the term "characteristic in the minimized syntax." The working group can decide whether to keep the concept of "classic attribute" alive in the schema -- perhaps for backwards compatibility. A classic attribute would be an Attr+ that is constrained to only the minimized syntax. You could have an attribute called "StringOnly" or something. This would be useful for backwards compatibility. Attr+es would give these attributes "a future" in that they could become structured later. The information set contribution of an attr+ would be an information set item just like an Attribute information item except that its value would be an arbitrarily sub-strucured node list instead of just references and characters as it is now. An extra property would specify whether the attribute+ was expressed syntactially as an element or as "classic attribute." Practically speaking, this means that the XPath @foo could return a node with elements as its children. Thanks to the XPath concept of "value", this is 100% backwards compatible. Given <foo>abc<emph>def</emph>ghi</foo>, @foo would return "abcdefghi" in any context expecting a string and the structured content anywhere else. It is debatable whether attr+es would show up in the information set as element content (or in the child:: axis of XPath). Having them NOT show up is probably more convenient because why would they be in two axes? On the other hand, this means that schema-using processors would see a very different information set than XML 1.0 processors. That may or may not be a "big deal." It depends on how much the information set is otherwise modified by XMLSchema (archetypes, datatypes etc.). There are various ways of making the information set backwards compatible and yet making it easy to filter "true content" from "attribute content." Paul Prescod
Received on Friday, 20 August 1999 13:38:14 UTC