- From: Jonathan Robie <jonathan.robie@softwareag.com>
- Date: Fri, 04 Jan 2002 09:23:44 -0500
- To: "Dare Obasanjo" <kpako@yahoo.com>, "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>, <www-xml-query-comments@w3.org>, <xml-dev@lists.xml.org>
- Cc: <suciu@cs.washington.edu>
At 08:04 AM 1/4/2002 -0800, Dare Obasanjo wrote: >----- Original Message ----- >From: "Jonathan Robie" <jonathan.robie@softwareag.com> >To: "Dare Obasanjo" <kpako@yahoo.com>; "Champion, Mike" ><Mike.Champion@SoftwareAG-USA.com>; <www-xml-query-comments@w3.org>; ><xml-dev@lists.xml.org> >Cc: <suciu@cs.washington.edu> >Sent: Thursday, January 03, 2002 4:26 PM >Subject: Re: [xml-dev] The use of XML syntax in XML Query > > > 2. Typechecking without type inference can work fine as long as you do not > > have joins. If joins are involved, typechecking becomes undecidable. > >So what happens with XQuery joins ( http://www.w3.org/TR/xquery/#id-joins ) >then? Will there be a caveat in the recommendation that indicates that static >typechecking can't be done w.r.t joins? What this means is that we need to use type inference for static typechecking in XQuery, because without it, typechecking becomes undecideable. Fortunately, type inference is precisely the approach we take. > > 3. Type inference is the most promising approach, but it does lead to some > > false negatives. He gives an example using a content model involving equal > > cardinality among three different elements in sequence. The schema for this > > can not be expressed in DTDs or in XML Schema, so it is not clear to me > > that this is a real limitation, but I just read this, and I need to do some > > thinking before I would want to draw a strong conclusion based on his > > examples. > >But his example is fairly simple and utilizes none of the really complex >abilities of XML schemas. His example involves <result> { for $x in document("doc.xml")//elm return <a>{ $x/text() }</a> for $x in document("doc.xml")//elm return <b>{ $x/text() }</b> for $x in document("doc.xml")//elm return <c>{ $x/text() }</c> } </result> In most current systems, the inferred content model of the result element is: a*, b*, c* Now suppose the DTD we require for our output is this: (((a,a)*, (b,b)*, (c,c)*) | ((a,a)*,a, (b,b)*,b, (c,c)*,c)) In this content model, either a,b, and c all occur an even number of times, or they all occur an odd number of times. Dan points out that the inferred content model is not a subset of the required content model, so static type checking would not be able to guarantee the required output. Incidentally, you don't need as complicated a required type as Dan uses to get this result, the following would suffice: ((a,a), (b,b), (c,c)) Here's my take on this: First off, we know that static typechecking is conservative. If a query fails static typechecking, that does not mean it will not generate a correct result, it means only that it can not be guaranteed to generate a correct result. That's one of the reasons that static typechecking is optional in XQuery. In this case, a human being can read the query and see that a more precise type could have been inferred, and this more precise type could have been guaranteed. One of the things we spend time on in the XQuery Working Group is examining the kinds of type information that might be computed for an expression, and trying to make it as precise as possible. There will almost certainly be cases like the one Dan points out where our static typechecking is less precise than you would prefer. The example he gives is rather artificial. He says that further work is needed to see whether false negatives like this will turn out to be a practical problem - in other words, will they be so frequent that people will ignore the useful results of static typechecking? Looking at the way most schemas and DTDs are written, I don't think that Dan's example is going to be so frequent that it would cause real problems, and the type inference associated with most expressions makes sense to me. But the best way to answer Dan's question, IMHO, is to get more implementation experience with the static type system, and there are a few more wrinkles that I would like to see fixed in the very near future. People need to play with real queries with real DTDs and schemas and see how useful the static type checking is for them in practice. The theorists have spent a lot of time creating this system, but we have not had much of a chance to kick the tires. I have spent a little time kicking the tires by looking at the queries I write and the DTDs or schemas associated with them, and my initial impression is that static typechecking will be very useful. >For instance, how can static typechecking work for >schemas that use identity constraints like xs:unique? I fail to see how one >can guarantee that the following expression > >f = <results> >FOR $x in /employee/age >RETURN > <employee-id>{ > $x * 13 > }</employee-id> ></results> > >will only return unique values for <employee-id> via type inference. Static typechecking involves structure, not values. Uniqueness constraints are defined in terms of values. Static typechecking also ignores the facets of simple types in XML Schema, which can also result in errors not detected by the static type system. But practical static type systems that also take values into account are well beyond the state of the art. > > 4. Dan concludes that type inference is still the most promising approach > > to static typechecking, but that further work needs to be done on its > > applicability and limitations. > > > > My own take on this is that static type checking using type inference is > > very promising, and seems to work well in theory for the kinds of queries I > > have looked at. An implementation using the current type system did catch > > interesting errors for me. I think our type system needs further work, and > > we need more practical experience using implementations that do static type > > checking. This is one of the highest priorities for me personally. > >I'm interested in the type of queries you've looked at. I'm not convinced that >this is as straightforward a problem as you've implied but readily admit that >my theoretical CS skills are nowhere near excellent so you may be right and >all I need is a little convincing. 1. Mismatches in content models Suppose the input DTD uses the following definition of author: <!ELEMENT author ( #PCDATA)> The output DTD uses the following definition of author: <!ELEMENT author (first, last)> The following query does not validate according to the output DTD, and the static type system will catch this: //author Any number of common errors can result in mismatches in content models - mistyping the name of an element or forgetting which name is used for it in the DTD or schema, putting elements in the wrong order - basically, the same kinds of errors you typically get when validating with a DTD or schema, except that the static type system can not detect things related to values such as identity constraints or facets of XML Schema simple types. 2. Depending on optional content Suppose we have the following employee element in our input DTD: <!ELEMENT employee (name, salary, percent-bonus?)> Our output DTD shows the computed bonus for each employee, which may be zero, but must be present: <!ELEMENT employee (name, bonus)> Now consider the following query: for $e in //employee return <employee> { $e/name, $e/salary * $e/percent-bonus } </employee> Oops, that has the wrong structure, because salary*bonus evaluates to a value, not to an element. Fortunately, as we showed in (1), the static type system catches this error. Let's try again: for $e in //employee return <employee> { $e/name, <bonus>{ $e/salary * $e/percent-bonus }</bonus> } </employee> Suppose we have a schema, rather than a DTD, and bonus is defined as a decimal value which may not be null. In XQuery, if percent-bonus evaluates to an empty sequence, then the product of salary and the empty sequence is an empty sequence. For any employee that did not have a percent-bonus in the input, there is no bonus in the output, so this does not pass the static type check. Let's fix the query using if-empty() to supply a default value of zero for percent-bonus: for $e in //employee return <employee> { $e/name, <bonus>{ $e/salary * if-empty($e/percent-bonus, 0) }</bonus> } </employee> That now passes static type checking. Incidentally, note that it is quite possible that we would have missed the second error in our testing, because our test files might have supplied percent-bonus for all employees. A customer site, using data corresponding to the same DTD, could have encountered the error we missed. Static type checking caught the error without the need for an adequate test to expose this dependency. 3. Forgetting the curly braces Suppose I forgot the curly braces in the above example: <bonus> $e/salary * if-empty($e/percent-bonus, 0) </bonus> This element constructor has string content, which does not match the type of a bonus element in my schema. 4. Confusing a URL with the resource to which it points This is an error I made several times while developing the RDF library for the Syntactic Web paper. Some of my functions returned URLs, others returned RDF descriptions, and sometimes I changed the return type of a function without remembering to change all function calls using that function. Since a URL and a resource have different structure, static type checking catches this error. I hope this gives the basic flavor. I think it this is a very useful degree of error checking, and these are errors that our current static type system already reports. Jonathan
Received on Friday, 4 January 2002 10:32:43 UTC