Re: Options for dealing with IDs from Norman Walsh on 2003-01-10 (www-tag@w3.org from January 2003)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Fri, 10 Jan 2003 12:57:18 -0500
To: www-tag@w3.org
Message-ID: <874r8g6g75.fsf@nwalsh.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

/ Chris Lilley <chris@w3.org> was heard to say:
| On Friday, January 10, 2003, 5:13:53 PM, Norman wrote:
| NW> / noah_mendelsohn@us.ibm.com was heard to say:
| NW> | I think I agree with Tim's other conclusion:  do nothing is probably the 
| NW> | least risky solution.  We've got too many typing mechanisms already.
|
| NW> I have mixed feelings, but I think I agree with Tim and Noah.
|
| NW> "IDness" is a consequence of validation. That means you have to
| NW> validate.
|
| So, your solution is option 1 or option 8 *DTD or Schema validation in
| all cases).

Yes. Or an internal subset as you point out further down. "The status quo."

| NW>  I understand that sometimes has painful consequences. If a
| NW> language wants to have IDs so that authors can point into documents,
| NW> the workaround is to establish a MIME type for that language and
| NW> describe what fragment identifiers mean independent of validation.
|
| That does not give you IDs. It gives you pointers. It does not solve
| the getElementByID problem and it does not solve the #fo selector
| problem.

Right. getElementsByID() returns an empty set if you haven't validated.

Workarounds for the #fo problem could be achieved in the CSS spec
without changing XML. (No, I don't have any specific workaround in
mind.)

| NW> Similarly, the semantics of intra-document references could be defined
| NW> independent of validation if necessary.
|
| I agree that, since we have well formed documents, the semantics of
| intra-document references should be defined independent of validation.
| There are tow ways to do this; one is to invent a whole new mechanism
| that is independent of IDs and define how that works. The other way,
| suggested in this thread, is to separate the assignment of IDness from
| that of validation.

As long as DTDs and schemas contribute "IDness" to the mix, they can't
be separated. I'd be a lot happier with separation.

What's being proposed here is another, independent mechanism *in
addition to* validation. Like Noah said, "we've got too many typing
mechanisms already".

| Which XML already does. Is it true to say that in the following
| instance
|
| <?xml version="1.0" encoding="UTF-8"?>
| <!DOCTYPE foo [
| <!ATTLIST foo partnum ID #IMPLIED>
| ]>
| <foo  partnum="i54321" bar="toto"/>
|
| a) The instance is well formed
| b) the instance is not valid(atable)
| c) the partnum attribute on foo is of type ID

Yep. All true.

| NW> On the other hand, one of the consequences xml:idAttr (and do a lesser
| NW> extent xml:id) that bothers me is that it moves this validation
| NW> semantic out into authoring space.
|
| To be clear; it does nothing to validation at all. It decorates a well
| formed instance. It does not do any validation and the three
| validation constraints that apply to IDs are no enforced unless there
| is a subsequent validation step (for example, with a W3C XML Schema).

Fair point. Let me rephrase. It provides an additional type annotation
mechanism out in the authoring space. This provides yet another
mechanism to do something and it may do so in ways that are sometimes
invalid.

If you look at a document with well-formed glasses on, then again with
validation glasses on, there are a small number of differences that
you may perceive. These proposals all add one more thing to that set.
I'd like to make that set smaller, not larger.

(Before someone points out xsi:type, let me just say I've never used
it and I hope I never do. Everytime I think about it, it whispers "I'm
a design flaw, but you can't quite work out what design would be
better, can you?" Then it giggles evilly.)

| Further, the validation semantic is already out in the authoring
| space. Authors can plug away in the internal subset - particularluy in
| those DTDs that have parameter entities in their content models
| precisely to allow for such extension) and can even declare the entire
| DTD in the internal subset and make it up as they go along.

I concede that not all uses of the internal subset are validation, but
I tend to think of them that way. Taking advantage of DTD parameter
entities more-or-less implies that you're doing full validation
because they almost never have any effect on a WF-only parser that
ignores the external subset. So they're mostly local modifications to
the DTD that occur before validation, and they usually indicate that
validation is expected.

| So I believe that your concern is unfounded because
|
| a) people can already do that, and

People can modify the schema that will be used on a per-instance
basis, and some of the modifications that they can perform effect a
document that isn't subjected to validation because of the minimal
"DTD processing requirements" placed on a WF parser.

That usage doesn't concern me as much.

| b) these proposals do not do it.

They do introduce yet another way to do something and the way that's
introduced will expose new kinds of validation problems.

I'm still concerned.

| NW> One of the reasons that W3C XML
| NW> Schema says that schema location information is only a hint is so that
| NW> I can apply my own schema independent of what the author asked for.
| NW> Well, what if I want to use some other attribute as an ID sometimes?
|
| Realistically, unless it was authored that way, your chances of
| getting uniqueness on attribute values that were not already checked
| for uniqueness are going to be spotty at best. But ok suppose you want
| to ....
|
| NW> It just seems to me that moving IDness into the document is a fairly
| NW> significant can of worms.
|
| Please see the example above which has the IDness in the instance and
| tell me how you home-grown Schema which declares the toto attribute to
| be an ID is going to deal with the input infoset that says partnum is
| an ID.

I didn't intend the latter comment about a can of worms as an
extension of the former comment. I concede that having different
schemas that use different attributes for IDness is a more theoretical
than practical example. But it still raises philosophical issues to
me.

I think the worms in the can are:

- - New validity problems:

  <!DOCTYPE foo SYSTEM "foo.dtd">
  <foo xml:id="bar"/>

  If foo.dtd contains

  <!ATTLIST foo name ID #IMPLIED>

  Then the former document means one thing if it's accessed with a WF
  parser and is rejected by a validating parser. You could argue that
  the same is true of

  <!DOCTYPE foo SYSTEM "foo.dtd">
  <foo id="bar"/>

  But it's not the same since a WF parser would not associate "IDness" with
  the 'id' attribute on foo. So xml:id really does introduce a new kind of
  error.

- - Complexity, the xml:idAttr (or xml:idAttrs) and the concomitant
  xml:idrefsAttr(s) add new levels of hierarchical complexity.

| NW> If pushed, I think I could come to terms with the simple xml:id
| NW> proposal, but the more complex variants look like too much complexity
| NW> to me.
|
| Firstly, glad you could settle for xml:id. I could too, if that was
| the best I was going to get but I think we can get better.
|
| However, it isn't simpler. If you have some XSL-T telmpate that copies
| a bunch of stuff to the output and then copies foo from the sample
| that I have above as a child element, then your choices are
|
| a) leave it alone and loose the IDness of partnum

When you build a new result tree, you lose IDness anyway.

| b) rewrite partnum to xml:id and possibly break tools that use part
| numbers

That's a choice the tool writer gets to make. And he or she can have
different transformations that do different things in different
contexts.

| The 'more complex' variant lets you
|
| c) leave it alone and retain the IDness by adding an attribute
|
| of course you have to have parsed the instance and looked in the
| infoset to get the IDness in the first place. If the example had
| instead been
|
| <?xml version="1.0" encoding="UTF-8"?>
| <foo  partnum="i54321" bar="toto" xml:idAttr="partnum"/>
|
| then just copying the foo element does everything. Which is what I
| meant by "aiding composability".

Yeah, but it's a whole new bit of context that the parser has to keep
around as it's building the infoset.

Yes, it's clear how it would be implemented and taken by itself it's
clearly not *that complex*, but I feel like over the last few years
we've taken a simple idea (a subset of SGML useful to the desperate
perl hacker) and added processing expectations and complexities (large
and small) on top of each other again and again and again.

All of the decisions to add stuff, taken in isolation, looked
tractable, but the whole is starting to appear ponderous. (Some would
argue it became ponderous long ago, but this is not a troll.).

I'm not sure that doing nothing is exactly the right answer, but today
I feel pretty strongly that something as complex as xml:idAttrs is
too much.

                                        Be seeing you,
                                          norm

- -- 
Norman.Walsh@Sun.COM    | It is a general error to imagine the loudest
XML Standards Architect | complainers for the public to be the most
Web Tech. and Standards | anxious for its welfare.--Edmund Burke, 1769
Sun Microsystems, Inc.  | 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>

iD8DBQE+Hwl+OyltUcwYWjsRAl9qAJ4zwxDaxq+mErTNOqUV1CbvdN0SYQCeK26w
RnLfZFdy93lWIID1ZCRTyho=
=N4KU
-----END PGP SIGNATURE-----
Received on Friday, 10 January 2003 12:57:28 UTC