- From: Babich, Alan <ABabich@filenet.com>
- Date: Thu, 11 Jun 1998 13:04:04 -0700
- To: "'www-webdav-dasl@w3.org'" <www-webdav-dasl@w3.org>
- Cc: "Babich, Alan" <ABabich@felix.filenet.com>
I've been thinking about the query condition some more. First, I like Jim's suggestion that boolean_op and boolean_expr etc. can be combined into just boolean_expr, so I'm going to incorporate that into my proposal. Second, I now realize that we can not force all implementations to have fully general comparison operators. For example, I am familiar with a system that has had gross sales of billions of dollars and is still selling strong, and that particular system limits the functionality of the relational operators (>, =, etc.) to the form that is already in the current protocol draft, i.e., <property> <relational op> <constant>. Plus, I know of at least two other commercially successful systems that have the same limitation. I do not believe we should force such systems to do work on their engines to accommodate DASL. Requiring a translation layer from XML to whatever API the engine uses is bad enough. However, I also believe that there are systems out there that don't have this limitation, and we want to make the full power of these systems available as well. So, our challenge is to accommodate both the limited and the general flavors of the six relational operators in the same framework. But I believe there is a common sense dictates that we don't have bazillions of separate versions of the query syntax, or even more than one if we can avoid it. I'm morally certain we will add plenty of new, optional, well known operators over time. Each time we add some new well known, optional query operators, we should not have to version the query syntax at all. All the 1.0 operators and all future operators should plug in to the same syntax. The syntax is independent of the particular properties in a collection. It should also be independent of the particular operators supported by a collection. Furthermore, XML Data is coming. We need to mesh gracefully with it. Finally, dealing with datatypes is required by our charter. In order to accomplish all the above, I propose we do the following: (1) All required operators are defined in the 1.0 spec. Operators added after 1.0 must be optional. Some of operators in the 1.0 spec. may be optional. The 1.0 spec. will make it perfectly clear which operators are required and which are optional. No well known operator defined in the DASL spec. can be deleted in a later version of the spec. No operator required in 1.0 can become optional later. No post 1.0 version of the spec. can define required operators, only optional operators. Thus, later versions of the spec. will be strictly additive. (2) We are chartered to provide a way to advertise the query capabilities of a collection in the 1.0 spec. This consists of providing a way for the collection to advertise (a) its properties, and (b) the query operators it supports. This implies that there is a command to ask the collection to return the exact list of operators it supports, and that the same or an additional command exists to ask the collection to return the list of properties, including their datatype, that it supports. (If the collection doesn't know all the properties it supports, then it can just return the list of properties required by WebDAV, and an indication that the list is incomplete.) (3) We must distinguish between the limited forms of the relational operators and the fully general forms by defining them as separate operators. (I've tried to think of another way to do it in XML DTD syntax e.g., by using attributes, but I've failed. I've given up.) (4) The contains operator requires a lot more definition. As it stands, we don't know what it does. We need multiple contains operators, e.g.: (a) literal pattern matching with wild cards a la the SQL LIKE operator. (b) stemming is done on the word in the string parameter (c) a list of words (perhaps a single string of words separated by blanks) ALL of which must occur in the document, and stemming is done on all of them. (d) a list of words ANY of which must occur in the document, and stemming is done. (e) a list of words or their synonyms must ALL occur in the document and stemming is done. (f) a list of words or their synonyms ANY of which must occur in the document and stemming is done. (g) a phrase that must occur in the document, capitalization and case enforced (h) a phrase that must occur in the document, capitalization and case insensitive (i) a list of words that must occur NEAR each other in the document (i.e., within N words). ETC. (5) No variation of contains can be required. This would exclude commercially successful systems that don't have this capability, including at least one system that has sold billions of dollars worth and is still going strong. Even on some systems that offer it, CBR is an optional feature, and it is charged for separately. Thus, in many customer sites, CBR isn't there. (6) This is the syntax I now propose for the where condition: <!ELEMENT where ( boolean_expr ) > <!ELEMENT boolean_expr ( boolean_prop | boolean_literal | and | or | not | gt1 | gte1 | eq1 | ne1 | ls1 | lse1 | gt2 | gte2 | eq2 | ne1 | ls2 | lse2 ) > <!-- variations of "contains" to be added later --> <!ELEMENT integer_expr ( integer_prop | integer_literal ) > <!ELEMENT real_expr ( real_prop | real_literal ) > <!ELEMENT string_expr ( string_prop | string_literal ) > <!ELEMENT datetime_expr ( datetime_prop | datetime_literal ) > <!ELEMENT boolean_prop ( #PCDATA ) > <!ELEMENT boolean_literal ( "t" | "f" | "unknown" ) > <!ELEMENT integer_prop ( #PCDATA ) > <!ELEMENT integer_literal ( #PCDATA) > <!ELEMENT real_prop ( #PCDATA ) > <!ELEMENT real_literal ( #PCDATA ) > <!ELEMENT string_prop ( #PCDATA ) > <!ELEMENT string_literal ( #PCDATA ) > <!ELEMENT datetime_prop ( #PCDATA ) > <!ELEMENT datetime_literal ( #PCDATA ) > <!ELEMENT and ( boolean_expr , boolean_expr+ ) > <!ELEMENT or ( boolean_expr , boolean_expr+ ) > <!ELEMENT not ( boolean_expr ) > <!ELEMENT gt1 ( ( integer_prop , integer_literal ) | ( real_prop , real_literal ) | ( string_prop , string_literal ) | ( datetime_prop , datetime_literal ) > <!ELEMENT gt2 ( ( integer_expr , integer_expr ) | ( real_expr , real_expr ) | ( integer_expr , real_expr ) | ( real_expr , integer_expr ) | ( string_expr , string_expr ) | ( datetime_expr , datetime_expr ) ) > <!ELEMENT gte1 ( ( integer_prop , integer_literal ) | ( real_prop , real_literal ) | ( string_prop , string_literal ) | ( datetime_prop , datetime_literal ) > <!ELEMENT gte2 ( integer_expr , integer_expr ) | ( real_expr , real_expr ) | ( integer_expr , real_expr ) | ( real_expr , integer_expr ) | ( string_expr , string_expr ) | ( datetime_expr , datetime_expr ) ) > <!ELEMENT eq1 ( ( integer_prop , integer_literal ) | ( real_prop , real_literal ) | ( string_prop , string_literal ) | ( datetime_prop , datetime_literal ) > <!ELEMENT eq2 ( integer_expr , integer_expr ) | ( real_expr , real_expr ) | ( integer_expr , real_expr ) | ( real_expr , integer_expr ) | ( string_expr , string_expr ) | ( datetime_expr , datetime_expr ) ) > <!ELEMENT ne1 ( ( integer_prop , integer_literal ) | ( real_prop , real_literal ) | ( string_prop , string_literal ) | ( datetime_prop , datetime_literal ) > <!ELEMENT ne2 ( integer_expr , integer_expr ) | ( real_expr , real_expr ) | ( integer_expr , real_expr ) | ( real_expr , integer_expr ) | ( string_expr , string_expr ) | ( datetime_expr , datetime_expr ) ) > <!ELEMENT ls1 ( ( integer_prop , integer_literal ) | ( real_prop , real_literal ) | ( string_prop , string_literal ) | ( datetime_prop , datetime_literal ) > <!ELEMENT ls2 ( integer_expr , integer_expr ) | ( real_expr , real_expr ) | ( integer_expr , real_expr ) | ( real_expr , integer_expr ) | ( string_expr , string_expr ) | ( datetime_expr , datetime_expr ) ) > <!ELEMENT lse1 ( ( integer_prop , integer_literal ) | ( real_prop , real_literal ) | ( string_prop , string_literal ) | ( datetime_prop , datetime_literal ) > <!ELEMENT lse2 ( integer_expr , integer_expr ) | ( real_expr , real_expr ) | ( integer_expr , real_expr ) | ( real_expr , integer_expr ) | ( string_expr , string_expr ) | ( datetime_expr , datetime_expr ) ) > The required operators would be and, or, gt1, gte1, eq1, ne1, ls1, lse1. The others would be optional: not, gt2, gte2, eq2, neq2, ls2, lse2, and all variants of contains. Using Jim's example query: where dav:getcontenttype = "image/gif" Using eq1: <where> <boolean_expr> <eq1> <string_prop>dav:getcontenttype</string_prop> <string_literal>image/gif</string_literal> </eq1> <boolean_expr> </where> Using eq2: <where> <boolean_expr> <eq2> <string_expr> <string_prop>dav:getcontenttype</string_prop> <string_expr> <string_expr> <string_literal>image/gif</string_literal> </string_expr> </eq2> </boolean_expr> </where> It is interesting to note that gt2, gte2, eq2, neq1, ls2, lse2 are all optional, and, that, given my proposed syntactic framework, they satisfy my "stricly additive" condition on future versions of the protocol. This implies that, providing my syntactic framework is adopted, they might not be strictly necessary to include in the 1.0 spec. Alan Babich
Received on Thursday, 11 June 1998 16:07:41 UTC