RE: Options for dealing with IDs


Great observation. Thanks!

I'll enthusiastically agree that xml:id and general linking stem from
closely related roots. In fact, I almost added xml:id alongside xml:href and
xml:src in my SkunkLink document [1], but ultimately decided not to since
xml:id is only 'half a link', and another group had already started a
document [2] (members only).

Namespaces in XML offered a great promise: the ability to mix-and-match XML
vocabularies. The HTML Working Group spent (and continues to spend) a great
deal of time working on "Modularization" techniques to make this happen with
DTDs, and now XML Schema. This is starting to happen.

You can put multi-namespace documents into two buckets:

1) Pre-meditated. Someone has to work out in advance a DTD or schema for the
combined language, for example using Modularization. When there's a
DTD/schema involved, IDs can be declared, so the status quo is painful, but
probably tolerable. Opinions vary.

Designing a multi-namespace DTD/schema can be quite challenging. Often, it's
desirable to get something working sooner, which leads to...

2) Ad-hoc. Increasingly, XML vocabularies are being combined in ways that
weren't foreseen by a DTD/schema author. A few examples are
rdf:parseType="Literal" and XForms instance data. I would probably include
in this category SOAP 1.2 and other areas where mandatory DTD/schema
processing isn't acceptable for some reason.

The status quo causes major problems for these kinds of documents. One poor
solution is to 'cheat', by treating technically non-ID things as ID. I fear
that under the status quo, this will get progressively worse. The official
solution (used, for example by the XForms test suite) [3] is to include a
DTD subset with fragile hard-coded prefix values. As such, it's really
difficult to maintain, and starting over is often easier that using
cut-n-paste for composability. We've ended up declaring a huge internal
subset on every single document, whether it needs it or not, along with
tightly restricting allowed prefixes.

If we're struggling with this, I don't feel good about how the less-geeky
web designers out there will fare.

What folks really need is a way to easily express author intent of links and
IDs, without getting tangled in validation.



[3] To be published RSN

-----Original Message-----
From: Williams, Stuart []
Sent: Monday, January 13, 2003 9:37 AM
To: 'Tim Bray'; Chris Lilley
Cc:; Norman Walsh
Subject: RE: Options for dealing with IDs

Tim, Chris, Norm,

At the risk of entangling this topic with that of linking in XML I'm
wondering whether the solution to the questions raised in this is entangled
with the solution to the linking in XML.

This thread asks the question "How do I recognise IDs in XML documents?"
which looks a bit like the linking question of "How do I recognise links in
XML documents?". There are common concerns about not having to require
validation; about having an approach that can be adopted by existing XML
languages; and about having an approach that 'works' in mixed-language
cases. This seems to be about the eXtsensible in XML - elementist/attibutist
viewpoints and possibly the choice of a namespace that is implicitly
declared (xml:).

(Optimistically) I'd hope that *IF* we were able to reach a principled
resolution to the linking question, then by and large the same principles
would apply to the ID question and the right answer would then be obvious. I
don't think that it works the other way round, because links are more
complex structures than IDs (actuation, transition effects, travesals,
roles, arcroles...). 

The xml:id design seems very coherent with XLink's xlink:href approach -
recognition based on relatively direct recognition of atttribute names. 

The element scoped xml:idAttr option introduces a level of attribute
remapping which in somes sense is evocative of HLink. As has been shown
elsewhere on this thread [1] xml:idAttrs becomes a bit cumbersome in mixed
language contexts where there is a lot of switching of id attribute names,
equally in legacy languages cases if the name of the id element varies
widely with host element name. This might motivate a means for stand-off
mark-up to remove clutter from the instance document - and for some that
would no doubt be going too far.

So... standing back a little from the particulars of IDs and hrefs and
IDREFs, I'm wondering if there are some princples that we could identify
related to extensibility and mixed languages in XML. Chris has described
this as a problem of conflating decoration with validation [2]. Maybe that
is the problem to attack and the solution to IDs and XML Linking - and then
we might have made a little headway on mixedNamespaceMeaning-13.

Just thinking aloud...

Received on Monday, 13 January 2003 15:20:50 UTC