Re: Options for dealing with IDs from Chris Lilley on 2003-01-08 (www-tag@w3.org from January 2003)

From: Chris Lilley <chris@w3.org>
Date: Wed, 8 Jan 2003 16:52:54 +0100
To: Robin Berjon <robin.berjon@expway.fr>
CC: www-tag@w3.org
Message-ID: <48139158953.20030108165254@w3.org>
On Wednesday, January 8, 2003, 11:06:08 AM, Robin wrote:

RB> Chris Lilley wrote:
>>   4) Add a predeclared id attribute to the xml namespace

RB> This is by far my favourite option, it's simple and efficient. I've been using 
RB> something similar (an id:id attribute) to ease processing of multi-namespace 
RB> documents and have been happy with it.

Its simple and efficient and suits those who are happy with it, but
requires those who currently use a different name or a different
namerspace or the per-element partition of unqualified names to change
if they want reliable processing.

>>   Advantages
>>   - no clash with DTDs or Schemas

RB> What happens when a DTD declares attribute "name" to be of type ID and it occurs 
RB> on the same element as xml:id with a different id, in a validated document?

True. I was thinking that the xml:id coluld not clash witha
redefinition of itself.

RB>  I'll
RB> grant you that's stupid behaviour,

Yes; its also stupid behaviour that people can do right now in their
own DTDs

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE toto [
<!ELEMENT toto  EMPTY>
<!ATTLIST toto foo ID #IMPLIED>
<!ATTLIST toto bar ID #IMPLIED>
]>
<toto/>


Which would be trivially caught by a validator of course. Note that
the idAttr option, being an attribute, can only occur once so does not
have that problem.

However, put one declaration in one DTD module and another in a set of
common attributes included in another DTD module and the situation
could easily occur.

Given modularisation and driver DTDs, one reason people gravitate
towards always calling their ID attributes id may be that XML deals with
multiple declarations of the same attribute - it picks the first - so
when writing a driver DTD such inadvertent clashes are silently
ignored.

RB> but one that needs to have a defined behaviour (this applies to
RB> other options as well).

Agreed. In the case of well formed documents where the same ID value
occurs more than once, I have heard several people suggest that the
first in document order be chosen. Given that attributes are
unordered, its not clear what should happen in a well formed document
if two attributes are both declared to be of type ID.

All the constraints on ID are validity constrants not well
formedness constraints.

http://www.w3.org/TR/REC-xml#id
http://www.w3.org/TR/REC-xml#one-id-per-el
http://www.w3.org/TR/REC-xml#id-default


RB> I'd be in favour of having the declared attribute take precedence
RB> there as it'll be more backwards compatible.

Take precedence on that element, or take precedence on all elements?

>>   Disadvantages
>>   - requires a (small) change to XML spec and XML parsers

RB> Not really, xml:base didn't affect either directly.

XML Base did not affect the XML spec directly. It did affect the
annonymous(xml spec plus xml namespaces) spec. And it does affect xml
parsers that choose to implement it (at the cost of creating another
different conformance level).

Given the crucial and central nature of URI sand URI references in W3C
specifications, then I would argue that xml:base should be added as a
part of the XML spec at the next rev, and be mandatory not optional.


>>   5) Add an inline, per-instance ID declaration method
>>   6) Add an inline, per subtree ID declaration method

RB> I could live with these two but I think they open cans of worms
RB> here and there.

All of the options open cans of worm someplace (including the "live
with this mess" option). Its a case of choosing your can.

RB> For option 5, what happens when I XInclude a document that has
RB> xml:idAttr on its root element?

This is why I sugested option 6 and is what I meant by
'composability'. Other examples that demostrate this: "what happens
when I include a module (say, SVG Basic structure module) in my DTD
driver such that what was previously a root element (svg) now becomes
a child of another element". (Assuming that the svg element had by
that point adopted xml:Attr from option 5. Also, note that the
possibility that the svg element would likely not be the root element
in a mixed namespace grammar is already dealt with in the spec.
Butthis is another type of composing and demonstrates why option 5 has
some problems).

RB> Do I get to rewrite the new subtree so that all occurences of the
RB> id attribute match the xml:idAttr of the including document?

That would be one option, and would be extra work. It is work that
option 6 would not require you to do (even if the attribute declared
as ID has the same name as the existing in scope declaration). I didn't
say it explicitly, but redundant declarations would be allowed and
would not cause problems.

RB> Option 6 doesn't suffer from that problem but it does have the
RB> drawback that you need to keep a stack of id attribute names.

Yes (and yes, it is a simple linear stack)

RB> IDs are meant to be simple,

Well currently you need to carry around a table of element names and
attribute names on that attribute, so this is indeed simpler.

RB> I'd rather one shouldn't need to carry
RB> an IDSupport class along with the usual NamespaceSupport.

RB> I also generally dislike the fact that they're not really namespace aware.

Huh?

RB> If I'm planning to have a complex document with many potential
RB> (and potentially evolving) namespaces, I'll want to pick a safe
RB> value for xml:idAttr to be certain that it doesn't mean anything
RB> in any of the possible vocabularies.

OK so that is an argument for the xml:id option. Or for a best
practice (should) of  xml:idAttr="id" - use that unless there is a
compelling reason not to.

RB> Using a QName would be an
RB> option, but I'd rather keep away from QNames-in-content.


QNames in attribute content is toothpaste that is already out of the
tube and all over the sink. And it seems to be less of a problem in
practice than might have been thought.

I did address QNames as attribute values and suggested a resolution
mechanism that seems to make sense.

RB> A PI of course could work,

OK for completeness I should add that as another option.

RB> but I suspect I'll be the only one to think
RB> that ;) (and it would have issues unless it's constrained to
RB> appear before the root element).

I assume it would have a global scope, rather than being a
stream-based directive that apples "from that point on" so the
constraint on being above the root element would be to avoid multiple
passes in parsing or backtracking and fixup.


-- 
 Chris                            mailto:chris@w3.org
Received on Wednesday, 8 January 2003 10:52:57 UTC