- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Mon, 16 Oct 2000 08:51:24 -0600
- To: "Martin J. Duerst" <duerst@w3.org>
- Cc: "Martin Gudgin" <marting@develop.com>, "Schema Comments" <www-xml-schema-comments@w3.org>, "Dan Rupe" <Dan_Rupe@go.com>
At 2000-10-15 03:34, Martin J. Duerst wrote: >>A bit vector is one way (I believe a fairly common one) of implementing >>the and-connector; it is, however, not the only way. > >What are the others? Straightforward finite state machines don't >do the job, as I explained in the message to Henry. Straightforward finite state machines have the disadvantage that in large and-groups they grow very rapidly in size. This does not mean they cannot be used, or have never been used, in production systems. And they certainly do "do the job" in any sense I think salient here: they calculate the correct answer in finite time. >>Could >>you give a concrete use case for allowing an arbitrary sequence of >>a, b, c, and d elements where (a) the sequence of the elements is >>significant, > >Did you want to write 'insignificant'? That's what both the current >all groups and my proposal are about. I think not. If the sequence of child elements has no significance, and they are not all optional, then the order of children might as well be (and usually should be) fixed. In a content model like (a,b,c,d,e) there are no inferences to be drawn from the fact that instance documents have elements in a particular order. In a content model like (a & b & c & d & e), the order of elements in the instance is subject to the control of the user and may be used to convey information. If there is no information to be conveyed, then (a,b,c,d,e) would do as well, and in most editors somewhat better. Use cases where the information conveyed by the sequence in the instance would be meaningful to the application would be far more persuasive evidence of the need for an & connector with the qualities being described, than use cases where the information is not meaningful. >>(b) each element must occur some distinct number of times >>(a one to four times, b exactly once, c ten to thirty times, and d >>exactly three times)? I have no trouble imagining users who say that >>is what they want; I am having trouble imagining a case where they >>are right. > >The very general case is probably extremely rare. But the >'unbounded' case for some of the elements is not that rare. >This is extremely similar to the other places where occurrence >indicators are used: 0, 1, and unbounded are the most frequent >cases, any other actual numbers are quite rare. If a, b, and c must each happen one or more times, and no significance is to be attached to their order, then a content model like (a+, b+, c+) captures all the constraints. If they must each occur zero or more times, (a*, b*, c*) or (a|b|c)* captures all the constraints (the second requires a note saying that the sequence is not significant). I have not seen anything to suggest that (a+ & b+ & c+) fills an actual need. >In another comment (in the context of character encodings and Unicode), >you have said that conversion back to legacy systems isn't that >important because we want things to move on. Do you see a difference >between non-Unicode systems and non-XML systems in that respect? No. The point of the parallelism with the SGML &-connector is not conversion (although conversion between XML and SGML systems does occupy a lot of attention in production systems, according to people I talk to), but preservation of the current relationship between XML and SGML as far as possible. >You are arguing here that the increasing difficulty of writing >the regular expressions corresponds to the increasing rarity and >undesirability of the patterns. Well, no. I am arguing that the pattern you describe as clumsy and error-prone is neither clumsy nor error-prone. >... This is just the core of my all group >proposal: allow people to write things down the way they >think about it, and let the machines do the rest of the work; >they are much better at it. I have a higher opinion of the ingenuity of people than you seem to: no matter what formalism is used, there will be languages people can describe easily and briefly with words which the formalism either cannot describe or can describe only with some difficulty. Seeking to "allow people to write things down the way they think about it" is seeking for artificial intelligence and the ability to define formal languages using only natural languages instead of formalisms. Left to one's own devices, one might well wish to leave the all-group and numeric exponents out of the language, because they map so poorly to standard grammatical formalisms and parser-generation techniques. The WG agreed to allow both, to support certain fairly simple cases (numeric exponents for EDI, all-groups for dumping relations), and voted against those who felt that these changes were the thin end of a wedge that could eventually destroy the basic conceptual model of document grammars. You are doing a good job of persuading me that the alarmists were right, and that the WG might have done better to take a firmer grammar-based line. >But I also very much understand that regular expressions are not >everybody's speciality, and I think that many people who will want >to use XML Schema won't be experts in regular expressions, and >shouldn't have to try to become experts. Becoming expert in a tool is only important for those who wish to use the tool well. One doesn't have to become an expert in regular expressions to use XML Schema or DTDs -- only to use them expertly. Michael Sperberg-McQueen
Received on Monday, 16 October 2000 10:52:53 UTC