Re: Options for dealing with IDs from Norman Walsh on 2003-01-10 (www-tag@w3.org from January 2003)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Fri, 10 Jan 2003 15:27:18 -0500
To: www-tag@w3.org
Message-ID: <87r8bk4uop.fsf@nwalsh.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

/ Chris Lilley <chris@w3.org> was heard to say:
| On Friday, January 10, 2003, 6:57:18 PM, Norman wrote:
| NW> -----BEGIN PGP SIGNED MESSAGE-----
| NW> / Chris Lilley <chris@w3.org> was heard to say:
| NW> | On Friday, January 10, 2003, 5:13:53 PM, Norman wrote:
| | NW>> / noah_mendelsohn@us.ibm.com was heard to say:
| | NW>> | I think I agree with Tim's other conclusion:  do nothing is probably the 
| | NW>> | least risky solution.  We've got too many typing mechanisms already.
| NW> |
| | NW>> I have mixed feelings, but I think I agree with Tim and Noah.
| NW> |
| | NW>> "IDness" is a consequence of validation. That means you have to
| | NW>> validate.
| NW> |
| NW> | So, your solution is option 1 or option 8 *DTD or Schema validation in
| NW> | all cases).
|
| NW> Yes. Or an internal subset as you point out further down. "The status quo."
|
| | NW>>  I understand that sometimes has painful consequences. If a
| | NW>> language wants to have IDs so that authors can point into documents,
| | NW>> the workaround is to establish a MIME type for that language and
| | NW>> describe what fragment identifiers mean independent of validation.
| NW> |
| NW> | That does not give you IDs. It gives you pointers. It does not solve
| NW> | the getElementByID problem and it does not solve the #fo selector
| NW> | problem.
|
| NW> Right. getElementsByID() returns an empty set if you haven't validated.
|
| You mean, you propose that it *should* return the empty set if you
| haven't validated.

Isn't that what it does today (if you'll allow that an internal subset
with a few attlist decls is "validation" in this context)?

| NW> Workarounds for the #fo problem could be achieved in the CSS spec
| NW> without changing XML. (No, I don't have any specific workaround in
| NW> mind.)
|
| Allow me to consider that assertion unproven, in that case, and merely
| observe that fixing the IDness problem in multiple *consumers* of IDs
| (probably in different ways) is clearly suboptimal to fixing it
| centrally.

Yep.

| NW> What's being proposed here is another, independent mechanism *in
| NW> addition to* validation.
|
| No, *before* validation.

You can't mean "no it's not in addition to". It's clearly "in addition
to" if it happens before validation and then I do validation.

In any event, it introduces a new opportunity for errors that hitherto
did not occur.

| NW> Like Noah said, "we've got too many typing mechanisms already".
|
| And like I said, not fixing this will give us plenty more as all the
| unsatisfied customers invent them one per specification. I can't
| believe that you are seriously proposing that.

Hmm. I don't think I'd seriously considered the possibility that other
specs would solve the problem by saying "in FooML, all attributes
named 'id' are of type ID by definition and must appear in the infoset
with that [attribute type]". But maybe they would.

| NW> Fair point. Let me rephrase. It provides an additional type annotation
| NW> mechanism out in the authoring space. This provides yet another
| NW> mechanism to do something and it may do so in ways that are sometimes
| NW> invalid.
|
| Of course. Well formed authoring always has the possibility of
| creating things that are then determined to be invalid - duplicate
| ids, incorrect content models, missing required attributes and so on.
| How is this different?

Maybe it isn't. It feels different, I guess, because it will make an
error that almost never-ever happens today one that occurs fairly
frequently (namely, having two attributes of type ID on the same
element).

| NW> If you look at a document with well-formed glasses on, then again with
| NW> validation glasses on, there are a small number of differences that
| NW> you may perceive. These proposals all add one more thing to that set.
| NW> I'd like to make that set smaller, not larger.
|
| So would I which is why I would like there to be a way to add IDness
| to the infoset of well formed documents and for the W3C XML Schema to
| pick that up as its input Infoset and reflect these values back in the
| PSVI so that the number of differences seen with the two sets of
| glasses becomes smaller: IDness is preserved after validation.

You can't add something to the set and make it smaller. With any of
these proposals it will become possible to have IDs in the WF view and
validity errors in other view in ways that do not occur today.

One logical extension of what your saying would be to remove xs:ID
from XML Schema and say that IDness really is separate. Then XML
Schema would have only key/keyref not id/idref and key/keyref.

| NW> (Before someone points out xsi:type, let me just say I've never used
| NW> it and I hope I never do. Everytime I think about it, it whispers "I'm
| NW> a design flaw, but you can't quite work out what design would be
| NW> better, can you?" Then it giggles evilly.)
|
| I hear Norm proposing option #10 (or is that 11) using xsi:type in the
| instance (though that would need to be a child element not an
| attribute now because we have multiple attributes ....)

Egad! I'm not proposing that. I'm not even remotely proposing
something that bears a faint resemblance to that!

| NW> | Further, the validation semantic is already out in the authoring
| NW> | space. Authors can plug away in the internal subset - particularluy in
| NW> | those DTDs that have parameter entities in their content models
| NW> | precisely to allow for such extension) and can even declare the entire
| NW> | DTD in the internal subset and make it up as they go along.
|
| NW> I concede that not all uses of the internal subset are validation, but
| NW> I tend to think of them that way.
|
| I agree you think of them that way. I am trying to get you not to
| think of them all that way because it complicates the architecture.

Document-instance schema modifications definitely complicates the
architecture. There's no question about that.

| NW>  Taking advantage of DTD parameter
| NW> entities more-or-less implies that you're doing full validation
| NW> because they almost never have any effect on a WF-only parser that
| NW> ignores the external subset. So they're mostly local modifications to
| NW> the DTD that occur before validation, and they usually indicate that
| NW> validation is expected.
|
| Yes. My point was merely that users can already affect validation when
| they are editing their instances, which you asserted was a bad thing and
| only introduced by these id proposals.

I asserted it was a bad thing and I said that these proposals
introduced new ways to do it. I don't think I ever said that only
these proposals introduced it. I didn't mean to anyway.

| NW> | So I believe that your concern is unfounded because
| NW> |
| NW> | a) people can already do that, and
|
| NW> People can modify the schema that will be used on a per-instance
| NW> basis, and some of the modifications that they can perform effect a
| NW> document that isn't subjected to validation because of the minimal
| NW> "DTD processing requirements" placed on a WF parser.
|
| Yes.
|
| NW> That usage doesn't concern me as much.
|
| Okay so modifications to the instance that affect the IDness do not
| concern you, ok that is good ....

I think it'd be fairer to say that existing mechanisms for such
modifications don't concern me as much. :-)

| There seems to be no problem in terms of validation with W3C XML
| Schema or with any other schema language that picks up an Infoset on
| the way in, because this mechanism merely adds to the infoset at parse
| time and can be defined to be the same sor tof annotation that Schema
| does, so a processor that works on the PSVI need not care where the
| IDness of  particular attribute came from.

But it could still result in an element having multiple attributes of
type ID. Are you proposing that that should no longer be an error?

| It might be, but your assertion about a use case of suddenly changing
| the IDness of a document and re-validating it does not establish the
| worminess. I can sense that you feel unease; this might be because its
| a can of worms or it might be that you have got used to treating two
| concepts as the same when in fact they are architecturally different
| and you are getting used to that.

Maybe.

| NW> I think the worms in the can are:
|
| NW> - - New validity problems:
|
| NW>   <!DOCTYPE foo SYSTEM "foo.dtd">
| NW>   <foo xml:id="bar"/>
|
| NW>   If foo.dtd contains
|
| NW>   <!ATTLIST foo name ID #IMPLIED>
|
| NW>   Then the former document means one thing if it's accessed with a WF
| NW>   parser and is rejected by a validating parser.
|
| Yes.
|
| Just as it would be rejected if it said
|
| <!DOCTYPE foo SYSTEM "foo.dtd">
|   <foo xml:lang="ja"/>
|
|   If foo.dtd contains
|
|   <!ATTLIST foo name CDATA #REQUIRED>
|
| Validation *always* has the chance for rejecting well formed
| documents. That is what it is for.

The point I've been trying to make is that these proposals introduce
*a new chance*.

Maybe on balance that's the right thing to do. Maybe.

| NW>   But it's not the same since a WF parser would not associate "IDness" with
| NW>   the 'id' attribute on foo. So xml:id really does introduce a new kind of
| NW>   error.
|
| Yes. One that is machine detectable, which is a big advance on the "if
| its a namespace that you have personal knowledge of" weasel-wording.

Point taken.

| NW> and added processing expectations and complexities (large
| NW> and small) on top of each other again and again and again.
|
| You persist in portraying it as complexity - who could argue for
| complexity - and I will persist in showing that not doing any of these
| options merely leaves great complexity in other places.

Fair enough :-)

| NW> but today I feel pretty strongly that something as complex as
| NW> xml:idAttrs is too much.
|
| Unfortunately you have not really demonstrated that it is.

What, I wonder, would constitute such a demonstration? You haven't
demonstrated that anything more than simple xml:id is necessary.
You've argued that it would be somewhat more convenient for some
authors of some documents (that have legacy schemas) to be able to
have nested, scoped ID declarations but you haven't convinced me
that's in the 80% case.

                                        Be seeing you,
                                          norm

- -- 
Norman.Walsh@Sun.COM    | A man is not necessarily intelligent because
XML Standards Architect | he has plenty of ideas, any more than he is a
Web Tech. and Standards | good general because he has plenty of
Sun Microsystems, Inc.  | soldiers.--Chamfort
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>

iD8DBQE+HyymOyltUcwYWjsRAsrCAJ4wMupNyhx4Wtm3LoYA+jhCjQt1QQCfckMx
K9CYC7/Uq3UK11zFZpW2VjU=
=RIwt
-----END PGP SIGNATURE-----
Received on Friday, 10 January 2003 15:27:35 UTC