Attr+: Integrating attributes and elements from Paul Prescod on 1999-08-20 (www-xml-schema-comments@w3.org from July to September 1999)

From: Paul Prescod <paul@prescod.net>
Date: Fri, 20 Aug 1999 11:56:49 -0400
To: w3c-xml-schema-ig@w3.org, www-xml-schema-comments@w3.org, xml-dev <xml-dev@ic.ac.uk>
Message-ID: <37BD7AC1.687A392C@prescod.net>
This proposal goes one step beyond the one in "&-compromise". It goes a
step farther towards making XML behave as a property/value language like
most "OO" languages.

One of the hardest questions to answer about XML DTD design is:
"element  or attribute." Attributes have certain virtues relating to
lexical typing and convenience and elements have the primary virtue of
being able to contain sub-structure.

XML schemas are set to reduce some of the benefits of attributes but not
all. Attributes will continue to be more convenient because they can
come in any order and because they are easier to type.

The problem is: if you choose to represent something as an attribute,
for user convenience and "intuitiveness," you can never change your mind
and allow it to have substructure later.

I propose a new concept called a "structured attribute" (attr+). An
attr+ is a property of an element. In most respects it is just like an
attribute. In fact, not much in the schema spec would change at all.

Am attr+ is different in that it can be syntactically fulfilled in a
document by EITHER an XML 1.0 attribute OR a sub-element element of the
same name. All such elements must precede the "real content" of an
element. All characteristics must have names that are different from any
element type name allowable in the element. A processor knows it has
shifted from processing characteristics to processing "content elements"
merely by looking at the generic identifiers ("tag names", for
DOM-heads).

Let me be clear that there is no new syntax in the document instance. A
characteristic "foo" of type "IDREF" on element "bar" can be expressed
as:

<bar foo="abc">...</bar>

or

<bar><foo>abc</foo>...</bar>

Attr+es can have full content models. If a particular instance of a
characteristic on a particular element happens to be all text (no
sub-elements) then it can be expressed as an attribute instead of as an
element. Over time, the word "attribute" would come to be synonymous
with the term "characteristic in the minimized syntax."

The working group can decide whether to keep the concept of "classic
attribute" alive in the schema -- perhaps for backwards compatibility. A
classic attribute would be an Attr+ that is constrained to only the
minimized syntax. You could have an attribute called "StringOnly" or
something. This would be useful for backwards compatibility. Attr+es
would give these attributes "a future" in that they could become
structured later.

The information set contribution of an attr+ would be an information set
item just like an Attribute information item except that its value would
be an arbitrarily sub-strucured node list instead of just references and
characters as it is now. An extra property would specify whether the
attribute+ was expressed syntactially as an element or as "classic
attribute." 

Practically speaking, this means that the XPath @foo could return a node
with elements as its children. Thanks to the XPath concept of "value",
this is 100% backwards compatible. Given
<foo>abc<emph>def</emph>ghi</foo>, @foo would return "abcdefghi" in any
context expecting a string and the structured content anywhere else.

It is debatable whether attr+es would show up in the information set as
element content (or in the child:: axis of XPath). Having them NOT show
up is probably more convenient because why would they be in two axes? On
the other hand, this means that schema-using processors would see a very
different information set than XML 1.0 processors. That may or may not
be a "big deal." It depends on how much the information set is otherwise
modified by XMLSchema (archetypes, datatypes etc.).

There are various ways of making the information set backwards
compatible and yet making it easy to filter "true content" from
"attribute content."

 Paul Prescod
Received on Friday, 20 August 1999 13:38:14 UTC