Re: Options for dealing with IDs from Chris Lilley on 2003-01-10 (www-tag@w3.org from January 2003)

From: Chris Lilley <chris@w3.org>
Date: Fri, 10 Jan 2003 16:47:18 +0100
To: Tim Bray <tbray@textuality.com>
CC: www-tag@w3.org
Message-ID: <88311622234.20030110164718@w3.org>
On Thursday, January 9, 2003, 6:57:35 PM, Tim wrote:

TB> Chris Lilley wrote:
>> Hello www-tag,

TB> I must say Chris has been eating his wheaties recently, providing much 
TB> of the input to TAG thinking.

>>   As requested by Dave Orchard, a listing of the options for dealing
>>   with IDs.
>> 
>>   1) Require DTD validation of all instances.

TB> Not a serious option IMHO.

Agree absolutely but listed for completeness; and some would argue
that this is the right approach.

>>   2) Steal all attributes of name id in the per-element partition
>>   Declare retrospectively that all attributes whose name is 'id' are of
>>   type ID because this is common practice anyway.

TB> I think if we did this there would be howls of protest which would die 
TB> away within a few weeks and the world would be a better place.

In this better world you would still need to document what to do if a
DTD then declares such an attribute to be IDREFS or CDATA or
something.

>>   3) Steal undeclared attributes of name id
>>   In well formed content that does not have a DTD, or that has a
>>   partial DTD used for decoration (declaring ID, declaring attribute
>>   defaults, etc) if an attribute is called id and has not been
>>   declared in the DTD, it is of type ID.

TB> Isn't this a variation of #1?  I.e. it requires that interoperable 
TB> processing requires fetching and looking in a DTD.  So probably not 
TB> realistic.

I agree that this option has a drawback that a well formed instance
could declare foo to be of type ID and a DTD might assign it some
other of its assortment of types *cough* and thus, validating
(strictly, external DTD subset fetching) processors and
non-validating (strictly, external DTD subset non-fetching) processors
would produce different results.

>>   4) Add a predeclared id attribute to the xml namespace

TB> This would cause slightly more impact on deployed software/data than #2, 
TB> but would still leave us ahead in the medium/long term, I could live 
TB> with it.

I could live with it too, though I think we might get better with a
little more work.

>>   5) Add an inline, per-instance ID declaration method
>>   6) Add an inline, per subtree ID declaration method

TB> I prefer #5 to #6 on grounds of simplicity,

Its simpler until you start reading bits of XML and copying them
someplace into a template (such as in XSLT) at which point, its
harder.

TB> but at this point in time I 
TB> do not believe XML needs to adopt yet another type-declaration 
TB> mechanism.  This is a slippery slope, camel's nose, pick your 
TB> metaphor... once we've got this we need to declare IDREF and then why 
TB> don't we add something saying whether order of children is significant, 
TB> and which attributes are URIs, etc etc etc.  I don't think we have 
TB> enough experience in hand to start down this road.

Phooey. The way to get such experience is to spec it out, go to last
call, and go to CR and see what comes out the other end.

>>   7) Muddle along
>>   Do nothing. Accept weasel wording in the DOM spec about knowledge of
>>   'well known namespaces' and conformance loopholes in the CSS spec
>>   about possible breakage in namespaces other than HTML and accept
>>   that we can't really point into XML documents unless we can be sure
>>   the client uses a validating parser and besides, it works in HTML so
>>   far and no-one really uses XML on the client anyway.

TB> I'm fine with this.

Since I trip over the breakage from this all the time I am very much
not fine with this.

TB>  You can safely point into any XML document that has
TB> a proper media-type registration in place,

How so? And what use is that media-type registration once you embed
that namespace into another one?

Besides, pointing is just one use of IDs.

TB> so you really only have
TB> problems with resources served as */xml, which is something that in 
TB> general Should Not Be Done, and fortunately generally isn't done.

This seems to require a media type registration for every combination
of namespaces that could be devised. Not very workable.


TB> Really, forget about following pointers, do you really think it's good 
TB> practice to write and publish a foo#bar URI ref into something of which 
TB> you don't know the media-type?  I don't.  And if you know the media-type 
TB> there's no problem.

TB> So I counsel inaction.

That is not a very realistic option. There is an architectual hole
here, we can fix it or we can play ostrich but if we pick the latter
option then people would be well justified in taping "TAG - kick here"
notices to our upstretched behinds.

TB> BTW, if the DOM is using weasel words they should just $@#!% well clean 
TB> them up and say that the notion of ID-ness is entirely DTD-dependent, 
TB> which it is.

Cool - tell everyone that the DOM only works if you have a DTD. or a
schema, maybe. I thought you said that option 1 was not realistic?

We could also say that #foo selector in CSS only works if DTD
validation has taken place.

Then we could write an architectual note and go round telling all the
existing users of the XML DOM and CSS with XML "Stop working! Break at
once! Trouble is, these things already work after a fashion. The job
of Web Architecture is to make them continue to work, but better and
more interoperably. Our job is not to break them and say "its clean
now".

TB>  It's perfectly OK for a DOM implementation to know what 
TB> the ID attributes are based on the namespace or media type without 
TB> having to read the DTD, so what's the problem?

That seems to contradict your previous sentence in a big way.

TB> Once again, it only arises when you only know it's XML and you
TB> don't have a DTD, and I think we just have to live with not
TB> knowing the ID in that scenario.

I regard your counsel of inaction and living with (or enforcing)
breakage to be severely suboptimal.

>> 8) Require W3C XML Schema validation of all instances.

TB> See #1.

>> My personal preference is for option 6) Add an inline, per subtree ID
>> declaration method. It would require work on what the precedence is
>> (or what sort of error it is) if the DTD or Schema declares the
>> designated attribute to be of a type other than ID.

TB> It would require a *lot* of work, with high chances for error and 
TB> unintended consequences & side-effects.  I'd say let's just not go 
TB> there.  -Tim

There are high chances of error and unintended side effects in the
present situation. The logical thing to do in that case is to fix
them.

-- 
 Chris                            mailto:chris@w3.org
Received on Friday, 10 January 2003 10:47:27 UTC