- From: Chris Lilley <chris@w3.org>
- Date: Fri, 10 Jan 2003 20:02:32 +0100
- To: www-tag@w3.org, Norman Walsh <Norman.Walsh@Sun.COM>
On Friday, January 10, 2003, 6:57:18 PM, Norman wrote: NW> -----BEGIN PGP SIGNED MESSAGE----- NW> Hash: SHA1 NW> / Chris Lilley <chris@w3.org> was heard to say: NW> | On Friday, January 10, 2003, 5:13:53 PM, Norman wrote: | NW>> / noah_mendelsohn@us.ibm.com was heard to say: | NW>> | I think I agree with Tim's other conclusion: do nothing is probably the | NW>> | least risky solution. We've got too many typing mechanisms already. NW> | | NW>> I have mixed feelings, but I think I agree with Tim and Noah. NW> | | NW>> "IDness" is a consequence of validation. That means you have to | NW>> validate. NW> | NW> | So, your solution is option 1 or option 8 *DTD or Schema validation in NW> | all cases). NW> Yes. Or an internal subset as you point out further down. "The status quo." | NW>> I understand that sometimes has painful consequences. If a | NW>> language wants to have IDs so that authors can point into documents, | NW>> the workaround is to establish a MIME type for that language and | NW>> describe what fragment identifiers mean independent of validation. NW> | NW> | That does not give you IDs. It gives you pointers. It does not solve NW> | the getElementByID problem and it does not solve the #fo selector NW> | problem. NW> Right. getElementsByID() returns an empty set if you haven't validated. You mean, you propose that it *should* return the empty set if you haven't validated. NW> Workarounds for the #fo problem could be achieved in the CSS spec NW> without changing XML. (No, I don't have any specific workaround in NW> mind.) Allow me to consider that assertion unproven, in that case, and merely observe that fixing the IDness problem in multiple *consumers* of IDs (probably in different ways) is clearly suboptimal to fixing it centrally. | NW>> Similarly, the semantics of intra-document references could be defined | NW>> independent of validation if necessary. NW> | NW> | I agree that, since we have well formed documents, the semantics of NW> | intra-document references should be defined independent of validation. NW> | There are tow ways to do this; one is to invent a whole new mechanism NW> | that is independent of IDs and define how that works. The other way, NW> | suggested in this thread, is to separate the assignment of IDness from NW> | that of validation. NW> As long as DTDs and schemas contribute "IDness" to the mix, they can't NW> be separated. I'd be a lot happier with separation. Well, I would be a lot happier if DTDs and Schemas could separate the tesks of decoration and validation too - I would like to see a PSI and a PSVI as separate things - but the solution to the problem of IDness in well formd instances does not depend on them doing that. NW> What's being proposed here is another, independent mechanism *in NW> addition to* validation. No, *before* validation. NW> Like Noah said, "we've got too many typing mechanisms already". And like I said, not fixing this will give us plenty more as all the unsatisfied customers invent them one per specification. I can't believe that you are seriously proposing that. NW> | Which XML already does. Is it true to say that in the following NW> | instance NW> | NW> | <?xml version="1.0" encoding="UTF-8"?> NW> | <!DOCTYPE foo [ NW> | <!ATTLIST foo partnum ID #IMPLIED> | ]>> NW> | <foo partnum="i54321" bar="toto"/> NW> | NW> | a) The instance is well formed NW> | b) the instance is not valid(atable) NW> | c) the partnum attribute on foo is of type ID NW> Yep. All true. Okay so the concept of IDness is *not* tied to validation. | NW>> On the other hand, one of the consequences xml:idAttr (and do a lesser | NW>> extent xml:id) that bothers me is that it moves this validation | NW>> semantic out into authoring space. NW> | NW> | To be clear; it does nothing to validation at all. It decorates a well NW> | formed instance. It does not do any validation and the three NW> | validation constraints that apply to IDs are no enforced unless there NW> | is a subsequent validation step (for example, with a W3C XML Schema). NW> Fair point. Let me rephrase. It provides an additional type annotation NW> mechanism out in the authoring space. This provides yet another NW> mechanism to do something and it may do so in ways that are sometimes NW> invalid. Of course. Well formed authoring always has the possibility of creating things that are then determined to be invalid - duplicate ids, incorrect content models, missing required attributes and so on. How is this different? NW> If you look at a document with well-formed glasses on, then again with NW> validation glasses on, there are a small number of differences that NW> you may perceive. These proposals all add one more thing to that set. NW> I'd like to make that set smaller, not larger. So would I which is why I would like there to be a way to add IDness to the infoset of well formed documents and for the W3C XML Schema to pick that up as its input Infoset and reflect these values back in the PSVI so that the number of differences seen with the two sets of glasses becomes smaller: IDness is preserved after validation. NW> (Before someone points out xsi:type, let me just say I've never used NW> it and I hope I never do. Everytime I think about it, it whispers "I'm NW> a design flaw, but you can't quite work out what design would be NW> better, can you?" Then it giggles evilly.) I hear Norm proposing option #10 (or is that 11) using xsi:type in the instance (though that would need to be a child element not an attribute now because we have multiple attributes ....) NW> | Further, the validation semantic is already out in the authoring NW> | space. Authors can plug away in the internal subset - particularluy in NW> | those DTDs that have parameter entities in their content models NW> | precisely to allow for such extension) and can even declare the entire NW> | DTD in the internal subset and make it up as they go along. NW> I concede that not all uses of the internal subset are validation, but NW> I tend to think of them that way. I agree you think of them that way. I am trying to get you not to think of them all that way because it complicates the architecture. NW> Taking advantage of DTD parameter NW> entities more-or-less implies that you're doing full validation NW> because they almost never have any effect on a WF-only parser that NW> ignores the external subset. So they're mostly local modifications to NW> the DTD that occur before validation, and they usually indicate that NW> validation is expected. Yes. My point was merely that users can already affect validation when they are editing their instances, which you asserted was a bad thing and only introduced by these id proposals. NW> | So I believe that your concern is unfounded because NW> | NW> | a) people can already do that, and NW> People can modify the schema that will be used on a per-instance NW> basis, and some of the modifications that they can perform effect a NW> document that isn't subjected to validation because of the minimal NW> "DTD processing requirements" placed on a WF parser. Yes. NW> That usage doesn't concern me as much. Okay so modifications to the instance that affect the IDness do not concern you, ok that is good .... NW> | b) these proposals do not do it. NW> They do introduce yet another way to do something and the way that's NW> introduced will expose new kinds of validation problems. Just like the 15 or so schema languages all introduce another way to do something. But yes, its a new way. In the specific case of the subset-less SOAP XML form, there is no other way (except for Schema processing after parsing, which is unlikely in a messaging environment that is security and performance conscious). NW> I'm still concerned. I agree that the "what happens when DTD validation is performed" is still an issue that needs to be addressed. That might be as simple as saying "if you have an external or internal subset and you declare attributes to be of some other type than ID then interoperability will suffer so you should ensure, if you are wise, that the IDness is the same with and without DTD validation". Or we could try and say which wins (but I am fairly sure the DTD would win because of the ass-ba^H^H^H^H^H^H prior declaration wins design and the instance is read last. Hence, a disparity between what is declared as ID in the well formed instance and what is (re)declared in the DTD might be best solved by authoring guidelines and best practice; people who don't follow that get exactly what they used to have, ie what is in the DTD is correct, and there is a disparity between well formed and valid views of the document, so they are no worse off. There seems to be no problem in terms of validation with W3C XML Schema or with any other schema language that picks up an Infoset on the way in, because this mechanism merely adds to the infoset at parse time and can be defined to be the same sor tof annotation that Schema does, so a processor that works on the PSVI need not care where the IDness of particular attribute came from. | NW>> One of the reasons that W3C XML | NW>> Schema says that schema location information is only a hint is so that | NW>> I can apply my own schema independent of what the author asked for. | NW>> Well, what if I want to use some other attribute as an ID sometimes? NW> | NW> | Realistically, unless it was authored that way, your chances of NW> | getting uniqueness on attribute values that were not already checked NW> | for uniqueness are going to be spotty at best. But ok suppose you want NW> | to .... NW> | | NW>> It just seems to me that moving IDness into the document is a fairly | NW>> significant can of worms. It might be, but your assertion about a use case of suddenly changing the IDness of a document and re-validating it does not establish the worminess. I can sense that you feel unease; this might be because its a can of worms or it might be that you have got used to treating two concepts as the same when in fact they are architecturally different and you are getting used to that. NW> | Please see the example above which has the IDness in the instance and NW> | tell me how you home-grown Schema which declares the toto attribute to NW> | be an ID is going to deal with the input infoset that says partnum is NW> | an ID. NW> I didn't intend the latter comment about a can of worms as an NW> extension of the former comment. I concede that having different NW> schemas that use different attributes for IDness is a more theoretical NW> than practical example. But it still raises philosophical issues to NW> me. Of course, and its good to think these thought experiments through to catch use cases. But as Len said its a case of "figuring out who pays which bills" and if it comes down to on the one hand having SOAP work and having RDF/XML processable by XML tools and having multi-namespace XML documents reliably and interoperably processed by a new generation of XML clients so we can ditch the 1997-brand 'HTML' clients that hamper us now - pauses for breath - on the one hand, and allowing someone to theoretically shuffle all the datatypes in an instance and see if it revalidates, then in terms of cost/benefit and who pays the bills its pretty obvious to me where the big win is. NW> I think the worms in the can are: NW> - - New validity problems: NW> <!DOCTYPE foo SYSTEM "foo.dtd"> NW> <foo xml:id="bar"/> NW> If foo.dtd contains NW> <!ATTLIST foo name ID #IMPLIED> NW> Then the former document means one thing if it's accessed with a WF NW> parser and is rejected by a validating parser. Yes. Just as it would be rejected if it said <!DOCTYPE foo SYSTEM "foo.dtd"> <foo xml:lang="ja"/> If foo.dtd contains <!ATTLIST foo name CDATA #REQUIRED> Validation *always* has the chance for rejecting well formed documents. That is what it is for. I propose that we deal with that by saying a) the DTD view wins; if the DTD says different things than the instance, that was your choice b) best current practice is to reflect into the instance wha the DTD says about IDness c) best current practice for new document types is to use a single attribute name for all attributes of type ID, where possible d) best current practice for namespaces which are expected to be mixed with others is to cal the ID attribute id. NW> You could argue that NW> the same is true of NW> <!DOCTYPE foo SYSTEM "foo.dtd"> NW> <foo id="bar"/> NW> But it's not the same since a WF parser would not associate "IDness" with NW> the 'id' attribute on foo. So xml:id really does introduce a new kind of NW> error. Yes. One that is machine detectable, which is a big advance on the "if its a namespace that you have personal knowledge of" weasel-wording. NW> - - Complexity, the xml:idAttr (or xml:idAttrs) and the concomitant NW> xml:idrefsAttr(s) add new levels of hierarchical complexity. Well, they add some compexity to document instances at the cost of removing some complexity and some uncertainty elsewhere, so its unfair to characterise it as "adding complexity". Saying "just do DTD validation if you want IDs otherwise live without" also adds complexity. | NW>> If pushed, I think I could come to terms with the simple xml:id | NW>> proposal, but the more complex variants look like too much complexity | NW>> to me. NW> | NW> | Firstly, glad you could settle for xml:id. I could too, if that was NW> | the best I was going to get but I think we can get better. NW> | NW> | However, it isn't simpler. If you have some XSL-T telmpate that copies NW> | a bunch of stuff to the output and then copies foo from the sample NW> | that I have above as a child element, then your choices are NW> | NW> | a) leave it alone and loose the IDness of partnum NW> When you build a new result tree, you lose IDness anyway. Because you can't guarantee uniqueness of the values? Sure, just as well we are *not validating* then ;-) or perhaps because you clearly can't copy and paste the DTD fragments the same as you can with elements - again, that means its handy that we are not relying on such a mechanism. NW> | b) rewrite partnum to xml:id and possibly break tools that use part NW> | numbers NW> That's a choice the tool writer gets to make. And he or she can have NW> different transformations that do different things in different NW> contexts. Yes, lots of flexibility there, plenty of room for tradeoffs. Given a choice between two alternatives both of which broke something, then rather than having the flexibility to write two different tools that broke things differently and the flexibility to carefully remember which tool to use when, I would rather have a third option that didn't force me to choose between keeping the local name or the type, but let me retain the local name along with its type - nice and simple. NW> | The 'more complex' variant lets you NW> | NW> | c) leave it alone and retain the IDness by adding an attribute NW> | NW> | of course you have to have parsed the instance and looked in the NW> | infoset to get the IDness in the first place. If the example had NW> | instead been NW> | NW> | <?xml version="1.0" encoding="UTF-8"?> NW> | <foo partnum="i54321" bar="toto" xml:idAttr="partnum"/> NW> | NW> | then just copying the foo element does everything. Which is what I NW> | meant by "aiding composability". NW> Yeah, but it's a whole new bit of context that the parser has to keep NW> around as it's building the infoset. And its right there on the element which makes XSLT handling much simpler. How many XSLT sheets have you seen that read the DTD and carefully constructed little internal subsets in the output document?? NW> Yes, it's clear how it would be implemented and taken by itself it's NW> clearly not *that complex*, but I feel like over the last few years NW> we've taken a simple idea (a subset of SGML useful to the desperate NW> perl hacker) That simple idea is now the basis for the worlds information system and its electronic commerce system. So its grown a bit beyond the desperate perl hacker, who I don't see doing a great job on a PSVI anytime this century. NW> and added processing expectations and complexities (large NW> and small) on top of each other again and again and again. You persist in portraying it as complexity - who could argue for complexity - and I will persist in showing that not doing any of these options merely leaves great complexity in other places. After al my point is not to gratuitously add complexity just to annoy people. my point is to have a simple, easily understood, rapidly retrofittable method to get a real interoperability and authoring benefit within a year or so. NW> All of the decisions to add stuff, taken in isolation, looked NW> tractable, but the whole is starting to appear ponderous. (Some would NW> argue it became ponderous long ago, but this is not a troll.). NW> I'm not sure that doing nothing is exactly the right answer, I am very sure that doing nothing is not the right answer. NW> but today I feel pretty strongly that something as complex as NW> xml:idAttrs is too much. Unfortunately you have not really demonstrated that it is. You have demonstrated that you feel uneasy about it, and that it is a change. You have argued that it increases complexity and I have argued that introducing one of these methods would decrease complexity of authoring multimedia documents for the Web and writing multi-namespace-aware XML Web clients. -- Chris mailto:chris@w3.org
Received on Friday, 10 January 2003 14:02:37 UTC