Re: Draft TAG finding available: xmlIDSemantics-32 from Paul Prescod on 2003-05-20 (www-tag@w3.org from May 2003)

From: Paul Prescod <paul@prescod.net>
Date: Tue, 20 May 2003 08:12:31 -0700
To: www-tag@w3.org
Message-ID: <3ECA45DF.1030804@prescod.net>
"How should the problem of identifying ID semantics in XML languages be 
addressed in the absence of a DTD?"

I find it easier to think up a solution if I rethink the problem 
statemnt. There are actually a few different problems here. Also, it is 
strange to me that the problem statement explicitly excludes one 
potential solution to at least one of the underlying problems.

Here are what I believe to be the underlying problems:

1. Consistency

Different specifications use different definitions of ID-ness. This is 
easily fixed with a W3C REC that might be entitled "Definition of 
element identifiers in XML." All future specs could point at that 
explicitly and most people would interpret the older ones as pointing at 
it implicitly. It could even explicitly override the obsolete definitions.

Fixing this problem does not really imply any new ideas or innovation. 
It could just say: "Element identifiers are as defined in the XML 1.1 
specification until further notice."

2. Modernity: Schemas and the Infoset

DTDs are losing popularity. The schema specification implies that 
schemas can invoke "ID-ness" but is it explicit? Are the infoset 
annotations caused by a schema declaration of "ID-ness" equivalent to 
the ones caused by a DTD declaration of "ID-ness"? If yes, then every 
specification built on the infoset would pick these up "for free". If 
no, then they wouldn't.

If other ways of implying ID-ness arise, can they merely be described in 
terms of infoset annotations or do they actually have to go back and 
change the "Definition of element identifiers in XML" specification.

3. Standalone-ness

It is not always appropriate or efficient to fetch a DTD or schema in 
order to figure out what the IDs of elements are. Therefore, it might 
make sense for there to be an "inline" way to declare an element's ID 
(either on the element or through some kind of idattr indirection).

Unlike the other two, this is not a clarification or unification of 
existing specifications, but new engineering solving a different problem.

4. DTD-less-ness

Some XML profiles want to do away with DTD declarations even if they are 
inline, on the basis that DTD processing can be expensive.

5. Consistency Again

What happens when these different ways of imputing ID-ness disagree. 
e.g. if the DTD gives ID XYZ to one element but the schema gives the 
same ID to another element and xml:id gives it to a third element.

=====

Rephrasing the problem this way helps me think more clearly about it. 
For instance, I can see that xml:id is not "the" solution because it 
does not address the problem that XML Schema _already_ implies it can 
change ID-ness.

I propose the following set of solutions:

  1 & 5) The infoset should have an "identifier" or "identifiers" 
information property on every element. (I'm surprised it 
doesn't...groves did). All specifications should use this property as 
their definition of element identifer. The infoset should also disallow 
conflicting property assignments.

  2) There should be some kind of addendum to XML Schema or a 
third-party specification connecting the XML Schema concept of ID to the 
infoset concept of id. Obviously this is only accessible to applications 
working with the PSVI, as XML's ID is only available to applications 
working with the output of a validating processor.

  3 & 4) There should also be a specification that defines how to 
directly set that infoset property using a new attribute like xml:id or 
xml:idattr.

  Paul Prescod
Received on Tuesday, 20 May 2003 11:12:29 UTC