- From: Chris Lilley <chris@w3.org>
- Date: Sat, 11 Jan 2003 19:51:42 +0100
- To: www-tag@w3.org, Norman Walsh <Norman.Walsh@Sun.COM>
On Friday, January 10, 2003, 9:27:18 PM, Norman wrote: NW> -----BEGIN PGP SIGNED MESSAGE----- NW> Hash: SHA1 NW> / Chris Lilley <chris@w3.org> was heard to say: NW> | On Friday, January 10, 2003, 6:57:18 PM, Norman wrote: | NW>> -----BEGIN PGP SIGNED MESSAGE----- | NW>> / Chris Lilley <chris@w3.org> was heard to say: | NW>> | On Friday, January 10, 2003, 5:13:53 PM, Norman wrote: | | NW>>> / noah_mendelsohn@us.ibm.com was heard to say: | | NW>>> | I think I agree with Tim's other conclusion: do nothing is probably the | | NW>>> | least risky solution. We've got too many typing mechanisms already. | NW>> | | | NW>>> I have mixed feelings, but I think I agree with Tim and Noah. | NW>> | | | NW>>> "IDness" is a consequence of validation. That means you have to | | NW>>> validate. | NW>> | | NW>> | So, your solution is option 1 or option 8 *DTD or Schema validation in | NW>> | all cases). NW> | | NW>> Yes. Or an internal subset as you point out further down. "The status quo." NW> | | | NW>>> I understand that sometimes has painful consequences. If a | | NW>>> language wants to have IDs so that authors can point into documents, | | NW>>> the workaround is to establish a MIME type for that language and | | NW>>> describe what fragment identifiers mean independent of validation. | NW>> | | NW>> | That does not give you IDs. It gives you pointers. It does not solve | NW>> | the getElementByID problem and it does not solve the #fo selector | NW>> | problem. NW> | | NW>> Right. getElementsByID() returns an empty set if you haven't validated. NW> | NW> | You mean, you propose that it *should* return the empty set if you NW> | haven't validated. NW> Isn't that what it does today (if you'll allow that an internal subset NW> with a few attlist decls is "validation" in this context)? It isn't validation, in this or any other context and no, that is not what it does today. Some DOM implementations (and some CSS parsers) on presented with a random bit of XML will only provide IDs if there is a full DTD available and validation succeeds; some will provide them only if there is some form of DTD available and that part mentions some IDs; some will recognize the namespace and use a cached DTD or some other internal data structure (which may not correspond to the one which may or may not be linked from the instance) and some will just say there are no IDs (erven if there is an external DTD subset that says otherwise). All of those possibilities are justifiable based on some reading or other of one of the relevant specifications. There are also other less defensible implementations such as "only HTML has IDFs" and "anything called id is an ID" and "anything called [iI][dD] is an ID" - mention these only as evidence of confusion in the marketplace. This is the current mess. This is 7) Muddle along. This is "lets insert some user agent sniffing on the server so that we can get a bit more interoperability". And this is, to be harsh but fair, the utter shambles that Tim Bray proposes we get comfy living in. | NW>> Workarounds for the #fo problem could be achieved in the CSS spec | NW>> without changing XML. (No, I don't have any specific workaround in | NW>> mind.) NW> | NW> | Allow me to consider that assertion unproven, in that case, and merely NW> | observe that fixing the IDness problem in multiple *consumers* of IDs NW> | (probably in different ways) is clearly suboptimal to fixing it NW> | centrally. NW> Yep. Thanks. | NW>> What's being proposed here is another, independent mechanism *in | NW>> addition to* validation. NW> | NW> | No, *before* validation. NW> You can't mean "no it's not in addition to". It's clearly "in addition NW> to" if it happens before validation and then I do validation. OK I read you to mean "in addition to" as in "happening in parallel with validation". I agree with your reformulation. NW> In any event, it introduces a new opportunity for errors that hitherto NW> did not occur. | NW>> Like Noah said, "we've got too many typing mechanisms already". NW> | NW> | And like I said, not fixing this will give us plenty more as all the NW> | unsatisfied customers invent them one per specification. I can't NW> | believe that you are seriously proposing that. NW> Hmm. I don't think I'd seriously considered the possibility that other NW> specs would solve the problem by saying "in FooML, all attributes NW> named 'id' are of type ID by definition and must appear in the infoset NW> with that [attribute type]". But maybe they would. I cite exhibit A, the SOAP specification, as an existence proof. | NW>> Fair point. Let me rephrase. It provides an additional type annotation | NW>> mechanism out in the authoring space. This provides yet another | NW>> mechanism to do something and it may do so in ways that are sometimes | NW>> invalid. NW> | NW> | Of course. Well formed authoring always has the possibility of NW> | creating things that are then determined to be invalid - duplicate NW> | ids, incorrect content models, missing required attributes and so on. NW> | How is this different? NW> Maybe it isn't. It feels different, I guess, because it will make an NW> error that almost never-ever happens today one that occurs fairly NW> frequently (namely, having two attributes of type ID on the same NW> element). Yes. It means, in effect, that well formedness constraints are well formedness constraints and validation constraints are validation constraints and that validation constraints are only enforced because of validation. That seems a whole lot clearer to me. | NW>> If you look at a document with well-formed glasses on, then again with | NW>> validation glasses on, there are a small number of differences that | NW>> you may perceive. These proposals all add one more thing to that set. | NW>> I'd like to make that set smaller, not larger. NW> | NW> | So would I which is why I would like there to be a way to add IDness NW> | to the infoset of well formed documents and for the W3C XML Schema to NW> | pick that up as its input Infoset and reflect these values back in the NW> | PSVI so that the number of differences seen with the two sets of NW> | glasses becomes smaller: IDness is preserved after validation. NW> You can't add something to the set and make it smaller. You misunderstand me. If I add something to one set that is already in the other set, then clearly the difference set gets smaller. NW> With any of these proposals it will become possible to have IDs in NW> the WF view and validity errors in other view in ways that do not NW> occur today. Half correct. You already agreed that we have IDs in the WF view today. I am only arguing that the WF view remains a WF view and not a "WF plus validation of some sort in some sense" view. it makes things clearer and simpler. NW> One logical extension of what your saying would be to remove xs:ID NW> from XML Schema and say that IDness really is separate. Then XML NW> Schema would have only key/keyref not id/idref and key/keyref. That is one possibility but not my preferred option; those people who are using W3C XML Schema to produce IDness and are happy to do so should be able to continue the practice. This is why I prefer to define the WF IDness in terms of contributions to an Infoset, adding the same properties to a PreSVI as W3C XML Schema would add to a PSVI. This is much the same as the Infoset that occurs when full DTD validation is done during parsing and then a W3C XML Schema is used as a secons step. The input infoset already has some xs:IDs in it. | NW>> (Before someone points out xsi:type, let me just say I've never used | NW>> it and I hope I never do. Everytime I think about it, it whispers "I'm | NW>> a design flaw, but you can't quite work out what design would be | NW>> better, can you?" Then it giggles evilly.) NW> | NW> | I hear Norm proposing option #10 (or is that 11) using xsi:type in the NW> | instance (though that would need to be a child element not an NW> | attribute now because we have multiple attributes ....) NW> Egad! I'm not proposing that. I'm not even remotely proposing NW> something that bears a faint resemblance to that! Heh! Well, I proposed options that I was really not happy with, for completeness; feel free to do the same. Its better to list an option and explain why it is not a realistic option than it is to not list it because of "obviousness" and have someone else read the document and assume it was missed out because we never thought of it (though that can happen too, and several of those options have been brought forward. | NW>> | Further, the validation semantic is already out in the authoring | NW>> | space. Authors can plug away in the internal subset - particularluy in | NW>> | those DTDs that have parameter entities in their content models | NW>> | precisely to allow for such extension) and can even declare the entire | NW>> | DTD in the internal subset and make it up as they go along. NW> | | NW>> I concede that not all uses of the internal subset are validation, but | NW>> I tend to think of them that way. NW> | NW> | I agree you think of them that way. I am trying to get you not to NW> | think of them all that way because it complicates the architecture. NW> Document-instance schema modifications definitely complicates the NW> architecture. There's no question about that. Validation that is not validation but in some sense is validation definitely complicates the architecture too.I mean, I am following XML 1.0 here. Its says there are three validation constraints on IDs and I am saying that when validation has not occurred, those validation constraints do not apply. This hardly seems contentious. NW> | Okay so modifications to the instance that affect the IDness do not NW> | concern you, ok that is good .... NW> I think it'd be fairer to say that existing mechanisms for such NW> modifications don't concern me as much. :-) Here we come back to comfort and familiarity, which is important, but can grow with time. NW> | There seems to be no problem in terms of validation with W3C XML NW> | Schema or with any other schema language that picks up an Infoset on NW> | the way in, because this mechanism merely adds to the infoset at parse NW> | time and can be defined to be the same sor tof annotation that Schema NW> | does, so a processor that works on the PSVI need not care where the NW> | IDness of particular attribute came from. NW> But it could still result in an element having multiple attributes of NW> type ID. Are you proposing that that should no longer be an error? No, I am proposing that, just like the XML 1.0 spec says, this is a validity constraint. So, it is a validation error. If you validate then you look for these sorts of errors and if you don't then you don't. Are you uncomfortable with an existing XML 1.0 instance with an incomplete (decorating, not validating) DTD being parsed by an existing non-validating but (internal and) external subset fetching parser, and the resulting infoset, on being validated by a W3C XML Schema parser, generating validation errors? Are you comfortable with an existing XML 1.0 instance with a complete DTD being parsed by an existing validating parser, and the resulting infoset, on being validated by a W3C XML Schema parser, generating validation errors? (For example, due to a restriction of some kind on a string, which the DTD cannot express)? If so, then the existing proposals are no different. If not, then the cause of your uncomfort is how DTDs and Schemas work together (and clearly they have to because W3C XML Schema deliberately does not provide an entity declaration mechanism) and should perhaps be worked out in a different thread. NW> | It might be, but your assertion about a use case of suddenly changing NW> | the IDness of a document and re-validating it does not establish the NW> | worminess. I can sense that you feel unease; this might be because its NW> | a can of worms or it might be that you have got used to treating two NW> | concepts as the same when in fact they are architecturally different NW> | and you are getting used to that. NW> Maybe. Ponder on it some more, while reading your first sentence of reply where you talk about validation "in this context" - validation is not context sensitive - it either happens or it doesn't. XML does not have a class of "mayee validated" instance. Its either well formed or valid. | NW>> I think the worms in the can are: NW> | | NW>> - - New validity problems: NW> | | NW>> <!DOCTYPE foo SYSTEM "foo.dtd"> | NW>> <foo xml:id="bar"/> NW> | | NW>> If foo.dtd contains NW> | | NW>> <!ATTLIST foo name ID #IMPLIED> NW> | | NW>> Then the former document means one thing if it's accessed with a WF | NW>> parser and is rejected by a validating parser. NW> | NW> | Yes. NW> | NW> | Just as it would be rejected if it said NW> | NW> | <!DOCTYPE foo SYSTEM "foo.dtd"> NW> | <foo xml:lang="ja"/> NW> | NW> | If foo.dtd contains NW> | NW> | <!ATTLIST foo name CDATA #REQUIRED> NW> | NW> | Validation *always* has the chance for rejecting well formed NW> | documents. That is what it is for. NW> The point I've been trying to make is that these proposals introduce NW> *a new chance*. Yes? And the various restrictions and data types in W3C XML Schemas introduced new chances, too. They introduced a chance that what a DTD considered to be perfectly valid CDATA was not infact a valid gDate or USPostalCode. This is an advantage, not a disadvantage. I suspect that in your day to day work you use DTD valid instances all the time and thus the root of your disquiet is that I am making well formed instances more visible. Yes, well formed instances can turn out to have errors or various sorts when validated. Valid instances can turn out to have errors of various sorts when further validated to a stricter, more restrictive or simply different set of validation constraints. NW> Maybe on balance that's the right thing to do. Maybe. I am pretty sure it is. | NW>> But it's not the same since a WF parser would not associate | NW>> "IDness" with the 'id' attribute on foo. So xml:id really does | NW>> introduce a new kind of error. NW> | Yes. One that is machine detectable, which is a big advance on NW> the "if | its a namespace that you have personal knowledge of" NW> weasel-wording. NW> Point taken. OK. | NW>> and added processing expectations and complexities (large | NW>> and small) on top of each other again and again and again. NW> | You persist in portraying it as complexity - who could argue for NW> | complexity - and I will persist in showing that not doing any of NW> these | options merely leaves great complexity in other places. NW> Fair enough :-) Thanks once again. | NW>> but today I feel pretty strongly that something as complex as | NW>> xml:idAttrs is too much. NW> | NW> | Unfortunately you have not really demonstrated that it is. NW> What, I wonder, would constitute such a demonstration? Well, something a bit more strongly articulated than statements that well formed (or indeed valid) instances can generate validation errors when further validated, or that this is new and makes you a little uneasy. NW> You haven't NW> demonstrated that anything more than simple xml:id is necessary. I have presented it as an option and I have pointed out both its strengths and its drawbacks (such as the need to change existing content). I have also said that I could live with it if its the best we can manage and that I think we can do better. I can't really be fairer than that. Maybe you consider the need to revise other specifications (such as renaming the ID in RDF/XML to xml:id and renaming the id in SDOAP to be in the XML namespace) to be trivial modifications. They might be but then again they might produce resistance from the authors of those specifications or they might not adopt that solution where they would have adopted a different solution. In those cases, that would be an argument that a different option to xml:id is necessary. NW> You've argued that it would be somewhat more convenient for some NW> authors of some documents (that have legacy schemas) to be able to NW> have nested, scoped ID declarations but you haven't convinced me NW> that's in the 80% case. Okay, fair enough. In my normal work I am handling multi-namespace XML documents routinely so I do consider this to fall into the 80% case. I also consider the insertion of XML snippets into XML templates to be a common operation in XML processing and that this process should continue to be simple while also allowing IDness and local names to be preserved. Again, that seems to fall fairly easily into the 80% or many peples paying XML work. A solution that requires a small change also seems easier to adopt and thus more likely to succeed. If someone picked ID or Id instead of id andf does not much care, or put id into their own namespace because there was not a compelling reason to do otherwise then sure, they could switch to xml:id with little pain. I would expect the resistance to be from people who picked PartNum or Catalog-Number or something in a non-English language and want to keep it that way. It may be that these cases are not frequent, not vocal, or tend to use DTD validation all the time anyway in which case they could be plausibly shuffled off into the 20% and told they were too expensive to cater for. or it might be that they are not, and we need to try to cater for them (which still might fail). We can't really tell until there is a more readable document for them, which can have wider review than this one, albeit public, list. I believe that the correct thing to do at this stage is to get wider review and to hear what the use cases are and which solutions suit which people. I therefore propose that the options document be revised to include the new options that were proposed and to add the advantages and disadvantages that became known as a result of this discussion. I volunteer to write such an update, although not next week because of travel (I am chairing a f2f meeting in Australia; email access will be sporadic in the next week). I would like to make one rev of such a document in this forum, to be sure that I did not miss out or misunderstand anyones point, and then issue the document as a W3C Note to gather feedback. This Note will list all the options, and invite feedback as to which are preferred options, which could be lived with, which are unacceptable, and to also collect use cases for real-world ID usage. I hope that the data collected will be helpful to the XML Core WG in deciding what if anything needs to be done in this area. -- Chris mailto:chris@w3.org
Received on Saturday, 11 January 2003 13:51:51 UTC