Options for dealing with IDs from Chris Lilley on 2003-01-07 (www-tag@w3.org from January 2003)

From: Chris Lilley <chris@w3.org>
Date: Tue, 7 Jan 2003 19:27:04 +0100
To: www-tag@w3.org
Message-ID: <9162008812.20030107192704@w3.org>
Hello www-tag,

  As requested by Dave Orchard, a listing of the options for dealing
  with IDs.

  1) Require DTD validation of all instances.
  A fully validating XML processor will, almost as a side effect,
  result in all attributes of type ID being so noted in the Infoset.

  Advantages:
  - existing mechanism (DTDs)
  
  Disadvantages:
  - existing mechanism is poor,
  - not namespace aware,
  - can't declare a content model of 'any' that really means 'any',
  - can't use with mixed namespace documents easily
  - hinders composability
  - needlessly conflates validation with decoration
  - leaves well formed documents in a backwater
  - retrogressive step

  2) Steal all attributes of name id in the per-element partition
  Declare retrospectively that all attributes whose name is 'id' are of
  type ID because this is common practice anyway.

  Advantages
  - much existing content becomes conformant without change
  - easy to explain

  Disadvantages
  - no help for content that uses a different name for its IDs
  - some existing content becomes changed retrospectively
  - may clash with declarations in DTDs or Schemas
  - user outrage, xml can only control the syntax and the
    xml namespace, not other namespaces
  - different behavior in validating and non-validating parsers
  - requires a change to CSS
  - requires a change to Xpath 1.0
  - requires a change to DOM levels 1, 2 and 3
  - requires a change to XSL-T
  - requires a change to (insert your spec here)

  3) Steal undeclared attributes of name id
  In well formed content that does not have a DTD, or that has a
  partial DTD used for decoration (declaring ID, declaring attribute
  defaults, etc) if an attribute is called id and has not been
  declared in the DTD, it is of type ID.

  Advantages
  - much existing content becomes conformant without change
  - fairly easy to explain

  Disadvantages
  - no help for content that uses a different name for its IDs
  - some existing content becomes changed retrospectively
  - may clash with declarations in Schemas
  - user annoyance, xml can only control the syntax and the
    xml namespace, not other namespaces
  - different behavior in validating and non-validating parsers
  - requires a change to CSS
  - requires a change to Xpath 1.0
  - requires a change to DOM levels 1, 2 and 3
  - requires a change to XSL-T
  - requires a change to (insert your spec here)

  4) Add a predeclared id attribute to the xml namespace
  In the same way that xml:base added a predeclared attribute to the
  existing xml:lang and xml:space attributes, add another one called
  xml:id. It is of type ID. It may cannot be declared (or redeclared)
  and thus its type cannot be changed. It can be used wherever you
  want an reliable, interoperable identifier

  Advantages
  - easy to explain
  - easy to use
  - easy to change content to use the new syntax
  - no clash with DTDs or Schemas
  - existing content not inadvertently affected

  Disadvantages
  - requires a (small) change to XML spec and XML parsers
  - no help for (all existing) content that uses a different
    name for its IDs
  - requires revision in any content specs that want to make use of it

  5) Add an inline, per-instance ID declaration method
  In the same way that xml:base added a predeclared attribute to the
  existing xml:lang and xml:space attributes, add another one called
  xml:idAttr. It takes as value the local name of an attribute. All
  attributes of that name in the per-element partition become of type
  ID. It may only be used on the root element of the instance.

  Advantages
  - easy to explain (easier than the DTD syntax, probably)
  - easy to use
  - existing content not inadvertently affected
  - very easy to change content to use the new syntax

  Disadvantages
  - requires a (small) change to XML spec and XML parsers
  - may clash with declarations in DTDs or Schemas
  - different behavior in validating and non-validating parsers
  - limits composability

  6) Add an inline, per subtree ID declaration method
  In the same way that xml:base added a predeclared attribute to the
  existing xml:lang and xml:space attributes, add another one called
  xml:idAttr. It takes as value the local name of an attribute. All
  attributes of that name in the per-element partition, on that
  element and its children become of type ID. It can be used on any
  element. It can also take the value "" in which case, no attributes
  on that element or its children are declared to be of type ID (used
  when composing multiple namespaces).

  Advantages
  - fairly easy to explain (easier than the DTD syntax, probably)
  - easy to use
  - existing content not inadvertently affected
  - very easy to change content to use the new syntax
  - aids composability
  - does not affect well-formed portions of multi-namespace documents

  Disadvantages
  - requires a (small) change to XML spec and XML parsers
  - may clash with declarations in DTDs or Schemas
  - different behavior in validating and non-validating parsers
  
  7) Muddle along
  Do nothing. Accept weasel wording in the DOM spec about knowledge of
  'well known namespaces' and conformance loopholes in the CSS spec
  about possible breakage in namespaces other than HTML and accept
  that we can't really point into XML documents unless we can be sure
  the client uses a validating parser and besides, it works in HTML so
  far and no-one really uses XML on the client anyway.

  Advantages
  - familiar pain
  - no changes to existing specs

  Disadvantages
  - new specs need similar weasel wording
  - interoperability headaches
  - user confusion about when is it an ID and when is it not
  - interoperability depends on the transmission of secret
    knowledge among cognoscenti
  - multi-namespace document integration not made easier
  - cross-namespace XML DOM scriptig still hit and miss
  - its a wart, and a readily fixable one


8) Require W3C XML Schema validation of all instances.
  A fully validating XML processor will, almost as a side effect,
  result in all attributes of type ID being so noted in the Infoset.

  Advantages:
  - existing mechanism starting to see acceptance
  
  Disadvantages:
  - existing mechanism is not fully deployed
  - too heavyweight for such a simple problem, will not be
    used on mobile platforms or other small devices
  - needlessly conflates validation with decoration
  - leaves well formed documents in a backwater

An optional variation on 5) and 6) is to accept either a local name or
a qname; if its a qname then resolve to a namespace URI, local name
pair on the element that has xml:idAttr and then all attributes
with that local name in that namespace are of type ID.  

In passing, note that the separation of validation from decoration has
an additional benefit: ID uniqueness remains a validation constraint
so in well formed XML, there can be multiple IDs with the same value
and if that happens, well the first one in document order is the
correct one (or some better scheme to be devised, but its not an
error).

If I have omitted a solution, or omitted significant advantages or
disadvantages, I would be glad to hear them.

My personal preference is for option 6) Add an inline, per subtree ID
declaration method. It would require work on what the precedence is
(or what sort of error it is) if the DTD or Schema declares the
designated attribute to be of a type other than ID.

Most (but not all) attributes called id are of type ID. Most (but not
all) attributes of type ID are called id. 100% of single-namespace
documents could be brought into conformance with this proposal by
adding a single attribute to the root element. 99% of them would be
brought into conformance by adding

xml:idAttr="id"

to the root element. Crucially, the 1% that do not atre still catered
for, a big advantage over options 2, 3 and 4.

Requiring DTD validation to get IDs is too big a retrogressive step;
it essentially throws away well formedness as a concept and also XML
namespaces, and needlessly conflates validation with decoration.

Requiring W3C XML Schema validation to get IDs is too big a forwards
step; it adds a lot of machinery to get a simple but crucial step
forward and needlessly conflates validation with decoration.

However, I would prefer that W3C XML Schema be revised so that the
behavior of documents that use xml:idAttr *and* use a W3C XML Schema
is consistent with regards to the attribute declared of type ID in the
instance, whether the Schema is used or not (in other words, an
implicit declaration in the instance is the same in the PSVI as if the
attribute had been declared of type ID in the Schema, except that part
of the PSVI that traces which Schema provided the rule - that part
would report that the instance provided the rule).

  
-- 
 Chris                          mailto:chris@w3.org
Received on Tuesday, 7 January 2003 13:27:07 UTC