- From: Babich, Alan <ABabich@filenet.com>
- Date: Wed, 29 Apr 1998 17:23:29 -0700
- To: "'Saveen Reddy (Exchange)'" <saveenr@Exchange.Microsoft.com>, "'www-webdav-dasl@w3.org'" <www-webdav-dasl@w3.org>
- Cc: "Babich, Alan" <ABabich@felix.filenet.com>
Saveen: Regarding your comment "ITEM 4 - I don't see the benefit of having n "greater than" operators ...", I would like to look a little more deeply into the problem, if I may. A query engine will execute the query on a server. Let us consider some of the detailed ramifications of that. (1) Let us consider, for the moment, only comparison between like types, i.e., integer > integer, string > string, datetime > datetime, etc. Even for this case of homogeneous types, the comparison routines are completely different. The comparison routines for two integers have nothing in common with the comparison routines for two strings. The string comparison routines are much more complicated, because for strings there are character set translation issues, collating sequence issues that arise from localization, and case sensitivity or insensitivity issues. So, the query engine is forced to discriminate which case pertains, so that it can call the correct comparison routine. For polymorphic "greater than", the query engine must decide which comparison routine to call based on the type of the operands. This is more complicated than necessary: Obviously, the query engine has to have the equivalent of a C "switch" statement on the operator regardless of whether the operators are polymorphic or not. The point is that it could get the determination of which comparison routine to call "for free" from the switch statement on the operator if the operators were NOT polymorphic. If the "greater than" operator were polymorphic, the query engine would case on the operator and end up in the case for "greater than", and then it would have to have a second switch statement on the operand types once it got there. The implementation cost and the run time cost of the extra switch statement is undesirable. There is no development or run time cost to making the "greater than" operator non polymorphic in the spec. There are only a few more lines of element declarations. (2) Now let us consider the case of mixed types. Heterogeneous types could simply be disallowed out of hand as a syntax error, but let us take the polymorhpic approach further toward its logical conclusion, since were are looking more deeply into the polymorphic operator issue. It turns out that comparisons of mixed types can sometimes make sense, even good sense. Obviously { 3 > 4.01 } can be considered a valid comparison. (The two different types involved here are integer and floating point.) Let us consider another example: Consider { integer > string }. It turns out that comparisons with these types (and comparisons of type { string > integer}) can make sense, depending upon the value of the string. Suppose the string value is "cow", and the integer value is 3. Then, almost everyone would agree that { 3 > "cow" } should be considered an illegal comparison. However, suppose the value of the integer is 3, and the value of the string is " 3,000,000.01 E -04 ". One could conceivably take the position that the comparison should be allowed, because by following some standard floating point notational conventions, the string could be converted into a floating point number, and the integer could be promoted to a floating point number, and comparing the two values makes sense. In Europe, however, the role of comma and period are interchanged, so the string is not in a valid numerical format, and the comparison would be illegal. Note that the integer could be a property, or it could be a literal. Also, the string could be a property or it could be a literal. Any of the four possible combinations of property and literal could possibly be legal or illegal, depending on the VALUE of the string property or literal, not on its being of type string. I know of existing systems that disallow the comparison of numbers to strings, and I know of systems that allow the comparison of numbers to strings if the string happens to be convertible into a number. One approach has a speed and simplicity advantage, and the other could be argued to have a functionality advantage. (3) Now let us consider error handling. Most people believe that catching errors as soon as possible is desirable. However, since we are looking more deeply into the problem, let us consider the error handling behavior in more detail. Let us consider the case of a system where a client PC produces a DASL query, sends it to a web server, which sends it to a layer of software that converts the DASL query to calls on the DMA (Document Management Alliance) API, and the DMA document management system stores its document catalog information in a relational database. Suppose further that we had polymorphic "greater than". What is the DASL-to-DMA API layer supposed to do with the { integer > string } case? The DMA technical committee discussed the polymorphic query operator issue at great length, and decided against polymorphic operators, so the DMA API is supposed to reject such a query. Therefore, the implementers of the DASL-to-DMA layer would have to decide whether it could convert the comparison to a { integer > integer } comparison, or a { floatingpoint > floatingpoint } comparison before passing the query on to the DMA API. This would only be legally possible if the string involved were a literal, not a property. So, sometimes that kind of comparison works, and sometimes it doesn't. When it doesn't, an error must be returned. This error occurred on the server. The DASL-to-DMA layer could have an error and pass a malformed query to the DMA API, and the DMA API could have an error and accept it. Then, when the query was converted to SQL and passed to the RDBMS, a SQL syntax error would occur. It would be difficult to pass a clear simple error back to the client that made the problem clear. Now consider the case where there is no DMA API layer, so no conversion to it. The DASL query would be passed on to some query engine. Suppose the query engine was based on a relational database engine, and it converted the DASL query to SQL and called the RDBMS. Well, depending upon the RDBMS, I can tell you that the query may succeed or fail, depending upon which RDBMS is used. If the query fails, it would be difficult to pass an error back to the client that would make it clear what was going on. Thus, when errors occur on the server, there is always greatly increased difficulty of making it clear to (a) the human end user, (b) the support people he calls when things don't work, and (c) the programmers who get involved in the bug fixing, whether or not the bug is in their code or some other company's code as to what exactly went wrong. Furthermore, the end user has to wait longer to even get an error. In contrast, if the DASL query is rejected on the client PC as being malformed, it is clear to everyone that the UI application that the user is running has a bug. Thus, it is clearly better to have the non polymorphic operators I proposed, since all of the expensive and time consuming problems that arise when polymorphic operators are used are precluded, and the query engine implementations are simpler and faster. (4) The non polymorphic operators are much easier to describe in XML. For the polymorphic operators, how are you going to express what combinations of types are legal, and which are illegal? You don't just want to have words in the spec. You want the element definitions to embody the correctness or incorrectness of the construction of the query. This is an industry standard. We need to be precise as to what is allow, and not leave holes. (5) DMA has drilled down on this particular rathole extensively, and, I believe, come up with the correct conclusion -- don't have polymorphic operators. There are only advantages, and no real disadvantages. (I don't consider a few more lines in the DASL spec. for element declarations to be a disadvantage.) I propose we leverage the DMA work instead of going down this rathole again with a different group of people, make DASL and DMA more interoperable, simplify the implementation of the query engines, make the query engines faster, by having the non polymorphic operators I proposed. (6) I would like to not how nicely the proposed typing extensions to XML would dovetail with my proposal for the query operators. Let us use the <elementType id="author"> <string/> </elementType> example. My proposal for the element definitions recurses down to properties and literals, and when it gets to this bottom level of the recursion, it in effect says "a string valued property is required here" or "a string literal is required here". With the XML extensions for typing, you could verify syntactically whether or not all the types match on the client side. Thus, my proposal dovetails perfectly. Alan Babich > -----Original Message----- > From: Saveen Reddy (Exchange) [SMTP:saveenr@Exchange.Microsoft.com] > Sent: April 28, 1998 10:30 AM > To: 'www-webdav-dasl@w3.org' > Subject: RE: Type checking in DASL > > > First, I just wanted to clarify that XML-DATA does type elements as > Alan's > example shows ... > > <elementType id="author"> > <string/> > </elementType> > > But this is only one part of XML-DATA .. the part that provides an > XML-based > "DTD", and that associates a type with all tags named "author". > XML-DATA can > also bind a type to an element on-the-fly ... > > <author dt:dt="string">Clark Kent</author> > <author dt:dt="int">235</author> > > ITEMs 1,2,3 -- I agree with these > > ITEM 4 - I don't see the benefite of making n "greater-than" operators > to > correspond with the n types as opposed to one operator that can deal > with n > types. I think the latter is the natural way to do it. Just to be > clear, I > don't claim the either is more expressive, but that the latter is > easier to > explain and use. > > Thanks, > Saveen > > > -----Original Message----- > From: Babich, Alan [mailto:ABabich@filenet.com] > Sent: Friday, April 17, 1998 5:40 PM > To: Alex Hopmann (Exchange); 'Reddy, Saveen'; 'Davis, Jim'; 'Reddy, > Surendra'; 'Henderson, Rick'; 'Jensen, Del' > Cc: Babich, Alan; 'www-webdav-dasl@w3.org' > Subject: Type checking in DASL > > > This memo describes one approach to introduce > syntactic type checking into DASL without relying > on XML extensions. Saveen has already pointed > out that there is a proposed extension > to XML for typing. It would extend the XML > language by having new constructs such as > > <elementType id="author"> > <string/> > </elementType> > > to define an "author" conceptual type, the value > of which is expressed as a string (i.e., the > datatype of "author" is "string"). > > In this memo, I just focus on the query condition. > > This proposal attempts to just use the existing > <!ELEMENT> declaration syntax, so that we don't > have any dependency on the XML extension proposal. > This proposal turns query conditions with type > mismatches into malformed XML documents. This proposal > will continue to work regardless of the outcome of the > proposal for XML datatype extensions. > > ITEM 1: > I postulate small, fixed set of interesting base > datatypes. In this memo, I assume Boolean, integer, > floating point, strings, and datetimes are the > interesting base datatypes for sake of discussion. > > NOTE: Even though I think we should represent > dates as strings in ISO 8061 format using the Zulu > time zone, I made datetimes their own datatype, > so that we can have different operators on them, > e.g., "today's date", etc., and so that you know > you want to run the date string through a format > routine before displaying it, etc. The fact > that SQL also makes datetimes their own type > is worthy of note, and bolsters this argument. END NOTE > > ITEM 2: > There are at most three things that can return a > value of a given datatype: (1) a property, > (2) a literal, and (3) an operator. Therefore, there is > an element type definition for expressions of each > base datatype, and the alternatives of this element > definition are (1) properties with that datatype, > (2) literals of that datatype, and (3) operators > returning that datatype (unless no operators > are currently defined that return that datatype). > > ITEM 3: > The "where" clause consists of an operator that > returns a Boolean value. > > ITEM 4: > To add a new operator in the future that returns results > of datatype x, (1) make an element definition, x_op > for all operators that return values of datatype x > if there isn't one already, > (2) make an element definition for the new operator > to show what its operands need to be, and > add it to list of the alternatives of x_op, > and (3) if there isn't already an alternative for x_op > in the element definition of x_expr, add x_op > to its list of alternatives. The x_op and x_expr > forms are just naming conventions, and have no other > significance. In particular, they are not > parsed to discover "x". > > Just for the heck of it, I've included a string > length operator, "strlen" that takes a string > argument and returns an integer, and a floating > point addition operator, just to illustrate > how the model extends -- not to propose them > as part of the minimal required operator set. > > > Here are the definitions for the query "where" condition: > > <!ELEMENT where ( boolean_op ) > > > <!-- These are all the operators that return a Boolean value --> > <!ELEMENT boolean_op ( and | or | not | > gt_integer | ge_integer | > eq_integer | ne_integer | > ls_integer | le_integer | > gt_float | ge_float | > eq_float | ne_float | > ls_float | le_float | > gt_string | ge_string | > eq_string | nq_string | > ls_string | le_string | > gt_datetime | ge_datetime | > eq_datetime | ne_datetime | > ls_datetime | le_datetime | > contains ) > > > <!ELEMENT and ( boolean_expr , boolean_expr+ ) > > <!ELEMENT or ( boolean_expr , boolean_expr+ ) > > <!ELEMENT not ( boolean_expr ) > > <!ELEMENT gt_integer ( int_expr , int_expr ) > > <!ELEMENT ge_integer ( int_expr , int_expr ) > > <!ELEMENT eq_integer ( int_expr , int_expr ) > > <!ELEMENT ls_integer ( int_expr , int_expr ) > > <!ELEMENT le_integer ( int_expr , int_expr ) > > <!ELEMENT gt_float ( float_expr , float_expr ) > > <!ELEMENT ge_float ( float_expr , float_expr ) > > <!ELEMENT eq_float ( float_expr , float_expr ) > > <!ELEMENT ne_float ( float_expr , float_expr ) > > <!ELEMENT ls_float ( float_expr , float_expr ) > > <!ELEMENT le_float ( float_expr , float_expr ) > > <!ELEMENT gt_string ( string_expr , string_expr ) > > <!ELEMENT ge_string ( string_expr , string_expr ) > > <!ELEMENT eq_string ( string_expr , string_expr ) > > <!ELEMENT ne_string ( string_expr , string_expr ) > > <!ELEMENT ls_string ( string_expr , string_expr ) > > <!ELEMENT le_string ( string_expr , string_expr ) > > <!ELEMENT gt_datetime ( datetime_expr , datetime_expr ) > > <!ELEMENT ge_datetime ( datetime_expr , datetime_expr ) > > <!ELEMENT eq_datetime ( datetime_expr , datetime_expr ) > > <!ELEMENT ne_datetime ( datetime_expr , datetime_expr ) > > <!ELEMENT ls_datetime ( datetime_expr , datetime_expr ) > > <!ELEMENT le_datetime ( datetime_expr , datetime_expr ) > > <!ELEMENT contains ( prop? , #PCDATA ) > > > <!-- These are all the operators that return an integer value --> > <!ELEMENT int_op ( strlen ) > > > <!ELEMENT strlen ( #PCDATA ) > > > <!-- These are all the operators that return a floating point value > --> > <!ELEMENT float_op ( add_float ) > > > <!ELEMENT add_float ( float_expr , float_expr+ ) > > > <!-- No operators are currently defined that return string > or datetime values. > --> > > <!-- The following elements define all Boolean valued expressions --> > <!ELEMENT boolean_expr ( boolean_op | boolean_prop | boolean_const ) > > <!ELEMENT boolean_prop ( prop ) > > <!ELEMENT boolean_const ( TRUE , FALSE, UNKNOWN ) > > > <!-- The following elements define all integer valued expressions --> > <!ELEMENT int_expr ( int_op | int_prop | int_const ) > > <!ELEMENT int_prop ( prop ) > <!ELEMENT int_const ( #PCDATA ) > > <!-- The following elements define all floating point valued > expressions > --> > <!ELEMENT float_expr ( float_op | float_prop | float_const ) > > <!ELEMENT float_prop ( prop ) > > <!ELEMENT float_const ( #PCDATA ) > > > <!-- The following elements define all string valued expressions --> > <!ELEMENT string_expr ( string_prop | string_const ) > > <!ELEMENT string_prop ( prop ) > > <!ELEMENT string_const ( #PCDATA ) > > > <!-- The following elements define all datetime valued expressions --> > <!ELEMENT datetime_expr ( datetime_prop | datetime_const ) > > <!ELEMENT datetime_prop ( prop ) > > <!ELEMENT datetime_const ( #PCDATA ) > > > > As an example, the query condition > > ( loan_processor = "Sam" AND loan_amount > 100000 ) OR > loan_risk_level > 3 > > We would have a parse tree that looks like this: > > OR > AND > = > loan_processor > "Sam" > > > loan_amount > 100000 > > > loan_risk_level > 3 > > and XML that looks like this: > > <where> > <boolean_op> > <or> > <boolean_op> > <and> > <boolean_op> > <eq_string> > <string_prop> > loan_processor > </sring_prop> > <string_const>Sam</string_const> > </eq_string> > </boolean_op> > <boolean_op> > <gt_integer> > <int_prop> > loan_amount > </int_prop> > <int_const>1000000</int_const> > </gt_integer> > </boolean_op> > </and> > </boolean_op> > <boolean_op> > <gt_integer> > <int_prop> > loan_risk_level > </int_prop> > <int_const>3</int_const> > </gt_integer> > </boolean_op> > </or> > </boolean_op> > </where> > > I'm not familiar enough with XML to know whether > we can improve on the definition of properties or constant > types. At the bottom level I just used "prop", > which is defined somewhere else, > and #PCDATA, which seems like it needs further > syntactic definition. > > What do you think? > > I'll be out of town at least until Thursday, 4/23/98, > and I'm already behind on my e-mail. I haven't even > read all the DASL e-mail yet. So, I won't be able > to respond before 4/23 to comments. > > Alan Babich
Received on Wednesday, 29 April 1998 20:25:47 UTC