Re: Options for dealing with IDs from Chris Lilley on 2003-01-10 (www-tag@w3.org from January 2003)

From: Chris Lilley <chris@w3.org>
Date: Fri, 10 Jan 2003 18:04:49 +0100
To: www-tag@w3.org, Norman Walsh <Norman.Walsh@Sun.COM>
Message-ID: <93316273578.20030110180449@w3.org>
On Friday, January 10, 2003, 5:13:53 PM, Norman wrote:


NW> -----BEGIN PGP SIGNED MESSAGE-----
NW> Hash: SHA1

NW> / noah_mendelsohn@us.ibm.com was heard to say:
NW> | I think I agree with Tim's other conclusion:  do nothing is probably the 
NW> | least risky solution.  We've got too many typing mechanisms already.

NW> I have mixed feelings, but I think I agree with Tim and Noah.

NW> "IDness" is a consequence of validation. That means you have to
NW> validate.

So, your solution is option 1 or option 8 *DTD or Schema validation in
all cases).

NW>  I understand that sometimes has painful consequences. If a
NW> language wants to have IDs so that authors can point into documents,
NW> the workaround is to establish a MIME type for that language and
NW> describe what fragment identifiers mean independent of validation.

That does not give you IDs. It gives you pointers. It does not solve
the getElementByID problem and it does not solve the #fo selector
problem.


NW> Similarly, the semantics of intra-document references could be defined
NW> independent of validation if necessary.

I agree that, since we have well formed documents, the semantics of
intra-document references should be defined independent of validation.
There are tow ways to do this; one is to invent a whole new mechanism
that is independent of IDs and define how that works. The other way,
suggested in this thread, is to separate the assignment of IDness from
that of validation.

Which XML already does. Is it true to say that in the following
instance

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ATTLIST foo partnum ID #IMPLIED>
]>
<foo  partnum="i54321" bar="toto"/>

a) The instance is well formed
b) the instance is not valid(atable)
c) the partnum attribute on foo is of type ID


NW> One of the reasons I have mixed feelings is that the preceding
NW> description doesn't sit very well with me. I think it's unfortunate
NW> that we've got an extensible markup language but we're encouraging
NW> everyone that uses it to invent a new MIME type.

Well put. And the MIME type route only fixes (or covers up) part of
the IDness problem.

NW> I thought, once, that an extensible markup language would
NW> automatically give us a uniform fragment identifier syntax, but I
NW> regret that appears not to be the case.

So lets fix that.

NW> On the other hand, one of the consequences xml:idAttr (and do a lesser
NW> extent xml:id) that bothers me is that it moves this validation
NW> semantic out into authoring space.

To be clear; it does nothing to validation at all. It decorates a well
formed instance. It does not do any validation and the three
validation constraints that apply to IDs are no enforced unless there
is a subsequent validation step (for example, with a W3C XML Schema).

Further, the validation semantic is already out in the authoring
space. Authors can plug away in the internal subset - particularluy in
those DTDs that have parameter entities in their content models
precisely to allow for such extension) and can even declare the entire
DTD in the internal subset and make it up as they go along.

So I believe that your concern is unfounded because

a) people can already do that, and
b) these proposals do not do it.


NW> One of the reasons that W3C XML
NW> Schema says that schema location information is only a hint is so that
NW> I can apply my own schema independent of what the author asked for.
NW> Well, what if I want to use some other attribute as an ID sometimes?

Realistically, unless it was authored that way, your chances of
getting uniqueness on attribute values that were not already checked
for uniqueness are going to be spotty at best. But ok suppose you want
to ....


NW> It just seems to me that moving IDness into the document is a fairly
NW> significant can of worms.

Please see the example above which has the IDness in the instance and
tell me how you home-grown Schema which declares the toto attribute to
be an ID is going to deal with the input infoset that says partnum is
an ID.


NW> If pushed, I think I could come to terms with the simple xml:id
NW> proposal, but the more complex variants look like too much complexity
NW> to me.

Firstly, glad you could settle for xml:id. I could too, if that was
the best I was going to get but I think we can get better.

However, it isn't simpler. If you have some XSL-T telmpate that copies
a bunch of stuff to the output and then copies foo from the sample
that I have above as a child element, then your choices are

a) leave it alone and loose the IDness of partnum
b) rewrite partnum to xml:id and possibly break tools that use part
numbers

The 'more complex' variant lets you

c) leave it alone and retain the IDness by adding an attribute

of course you have to have parsed the instance and looked in the
infoset to get the IDness in the first place. If the example had
instead been

<?xml version="1.0" encoding="UTF-8"?>
<foo  partnum="i54321" bar="toto" xml:idAttr="partnum"/>

then just copying the foo element does everything. Which is what I
meant by "aiding composability".

-- 
 Chris                            mailto:chris@w3.org
Received on Friday, 10 January 2003 12:04:54 UTC