New grammar for where condition

I've been thinking about the query condition
some more. 

First, I like Jim's suggestion that
boolean_op and boolean_expr etc. can be 
combined into just boolean_expr, so I'm going
to incorporate that into my proposal.

Second, I now realize that we can not force
all implementations to have fully general
comparison operators. For example, I am
familiar with a system that has had gross sales
of billions of dollars and is still selling
strong, and that particular system limits the
functionality of the relational 
operators (>, =, etc.) to the form that is
already in the current protocol draft,
i.e., <property> <relational op> <constant>. 
Plus, I know of at least two other commercially
successful systems that have the same
limitation. I do not believe we should
force such systems to do work on their
engines to accommodate DASL. Requiring
a translation layer from XML to whatever
API the engine uses is bad enough.

However, I also believe that there are
systems out there that don't have this
limitation, and we want to make the
full power of these systems available
as well.

So, our challenge is to accommodate both
the limited and the general flavors
of the six relational operators in the
same framework.

But I believe there is a common sense dictates
that we don't have bazillions of separate
versions of the query syntax, or even
more than one if we can avoid it. I'm
morally certain we will add plenty of
new, optional, well known operators over time. 
Each time we add some new well known, optional
query operators, we should not have
to version the query syntax at all. All the
1.0 operators and all future operators
should plug in to the same syntax.
The syntax is independent of the
particular properties in a collection.
It should also be independent of the
particular operators supported by
a collection.

Furthermore, XML Data is coming.
We need to mesh gracefully with it.

Finally, dealing with datatypes
is required by our charter.

In order to accomplish all the above, I propose
we do the following:

(1) All required operators are defined
in the 1.0 spec. Operators added after
1.0 must be optional. Some of operators
in the 1.0 spec. may be optional.
The 1.0 spec. will make it perfectly
clear which operators are required
and which are optional. No well known 
operator defined in the DASL spec. 
can be deleted in a later version of the spec.
No operator required in 1.0 can become
optional later. No post 1.0 version of
the spec. can define required operators,
only optional operators. Thus, later
versions of the spec. will be strictly
additive.

(2) We are chartered to provide a way
to advertise the query capabilities
of a collection in the 1.0 spec. This
consists of providing a way for the 
collection to advertise (a) its properties,
and (b) the query operators it supports.
This implies that there is a command
to ask the collection to return the
exact list of operators it supports,
and that the same or an additional
command exists to ask the collection
to return the list of properties,
including their datatype, that it
supports. (If the collection doesn't
know all the properties it supports,
then it can just return the list
of properties required by WebDAV,
and an indication that the list is
incomplete.)

(3) We must distinguish between the
limited forms of the relational operators
and the fully general forms by defining
them as separate operators. (I've tried
to think of another way to do it in
XML DTD syntax e.g., by using attributes,
but I've failed. I've given up.)

(4) The contains operator requires
a lot more definition. As it stands,
we don't know what it does. We need
multiple contains operators, e.g.:
(a) literal pattern matching with
wild cards a la the SQL LIKE operator.
(b) stemming is done on the word
in the string parameter
(c) a list of words (perhaps a single
string of words separated by blanks)
ALL of which must occur in the document,
and stemming is done on all of them.
(d) a list of words ANY of which must
occur in the document, and stemming
is done.
(e) a list of words or their synonyms
must ALL occur in the document and
stemming is done.
(f) a list of words or their synonyms
ANY of which must occur in the document
and stemming is done.
(g) a phrase that must occur in the
document, capitalization and case
enforced
(h) a phrase that must occur in the
document, capitalization and case
insensitive
(i) a list of words that must occur
NEAR each other in the document
(i.e., within N words).

ETC.

(5) No variation of contains can
be required. This would exclude
commercially successful systems
that don't have this capability,
including at least one system
that has sold billions of dollars
worth and is still going strong.
Even on some systems that offer it, 
CBR is an optional feature, and it
is charged for separately. Thus,
in many customer sites, CBR isn't
there.

(6) This is the syntax I now propose
for the where condition:

<!ELEMENT where ( boolean_expr ) >

<!ELEMENT boolean_expr ( boolean_prop | boolean_literal |
                         and | or | not | 
                         gt1 | gte1 | eq1 | ne1 | ls1 | lse1 |
                         gt2 | gte2 | eq2 | ne1 | ls2 | lse2 ) >
<!-- variations of "contains" to be added later -->
<!ELEMENT integer_expr ( integer_prop | integer_literal ) >
<!ELEMENT real_expr ( real_prop | real_literal ) >
<!ELEMENT string_expr ( string_prop | string_literal ) >
<!ELEMENT datetime_expr ( datetime_prop | datetime_literal ) > 

<!ELEMENT boolean_prop ( #PCDATA ) >
<!ELEMENT boolean_literal ( "t" | "f" | "unknown" ) >
<!ELEMENT integer_prop ( #PCDATA ) >
<!ELEMENT integer_literal ( #PCDATA) >
<!ELEMENT real_prop ( #PCDATA ) >
<!ELEMENT real_literal ( #PCDATA ) >
<!ELEMENT string_prop ( #PCDATA ) >
<!ELEMENT string_literal ( #PCDATA ) >
<!ELEMENT datetime_prop ( #PCDATA ) >
<!ELEMENT datetime_literal ( #PCDATA ) >

<!ELEMENT and ( boolean_expr , boolean_expr+ ) >
<!ELEMENT or ( boolean_expr , boolean_expr+ ) >
<!ELEMENT not ( boolean_expr ) >
<!ELEMENT gt1 ( ( integer_prop , integer_literal ) |
                ( real_prop , real_literal ) |
                ( string_prop , string_literal ) |
                ( datetime_prop , datetime_literal ) >
<!ELEMENT gt2 ( ( integer_expr , integer_expr ) |
                          ( real_expr , real_expr ) |
                          ( integer_expr , real_expr ) |
                          ( real_expr , integer_expr ) |
                          ( string_expr , string_expr ) |
                          ( datetime_expr , datetime_expr ) ) >
<!ELEMENT gte1 ( ( integer_prop , integer_literal ) |
                 ( real_prop , real_literal ) |
                 ( string_prop , string_literal ) |
                 ( datetime_prop , datetime_literal ) >
<!ELEMENT gte2 ( integer_expr , integer_expr ) |
                          ( real_expr , real_expr ) |
                          ( integer_expr , real_expr ) |
                          ( real_expr , integer_expr ) |
                          ( string_expr , string_expr ) |
                          ( datetime_expr , datetime_expr ) ) >
<!ELEMENT eq1 (  ( integer_prop , integer_literal ) |
                ( real_prop , real_literal ) |
                ( string_prop , string_literal ) |
                ( datetime_prop , datetime_literal ) > 
<!ELEMENT eq2 ( integer_expr , integer_expr ) |
                          ( real_expr , real_expr ) |
                          ( integer_expr , real_expr ) |
                          ( real_expr , integer_expr ) |
                          ( string_expr , string_expr ) |
                          ( datetime_expr , datetime_expr ) ) >
<!ELEMENT ne1 ( ( integer_prop , integer_literal ) |
                ( real_prop , real_literal ) |
                ( string_prop , string_literal ) |
                ( datetime_prop , datetime_literal ) > 
<!ELEMENT ne2 ( integer_expr , integer_expr ) |
                          ( real_expr , real_expr ) |
                          ( integer_expr , real_expr ) |
                          ( real_expr , integer_expr ) |
                          ( string_expr , string_expr ) |
                          ( datetime_expr , datetime_expr ) ) >
<!ELEMENT ls1 ( ( integer_prop , integer_literal ) |
                ( real_prop , real_literal ) |
                ( string_prop , string_literal ) |
                ( datetime_prop , datetime_literal ) > 
<!ELEMENT ls2 ( integer_expr , integer_expr ) |
                          ( real_expr , real_expr ) |
                          ( integer_expr , real_expr ) |
                          ( real_expr , integer_expr ) |
                          ( string_expr , string_expr ) |
                          ( datetime_expr , datetime_expr ) ) >
<!ELEMENT lse1 ( ( integer_prop , integer_literal ) |
                ( real_prop , real_literal ) |
                ( string_prop , string_literal ) |
                ( datetime_prop , datetime_literal ) > 
<!ELEMENT lse2 ( integer_expr , integer_expr ) |
                          ( real_expr , real_expr ) |
                          ( integer_expr , real_expr ) |
                          ( real_expr , integer_expr ) |
                          ( string_expr , string_expr ) |
                          ( datetime_expr , datetime_expr ) ) >

The required operators would be and, or, gt1, gte1, eq1, ne1, 
ls1, lse1. The others would be optional: not, gt2, gte2,
eq2, neq2, ls2, lse2, and all variants of contains.

Using Jim's example query:  where dav:getcontenttype = "image/gif"

Using eq1:

<where>
  <boolean_expr>
    <eq1>
      <string_prop>dav:getcontenttype</string_prop>
      <string_literal>image/gif</string_literal>
    </eq1>
  <boolean_expr>
</where>

Using eq2:

<where>
  <boolean_expr>
    <eq2>
      <string_expr>
        <string_prop>dav:getcontenttype</string_prop>
      <string_expr>
      <string_expr>
        <string_literal>image/gif</string_literal>
      </string_expr>
    </eq2>
  </boolean_expr>
</where>

It is interesting to note that gt2, gte2, eq2, neq1, ls2, lse2
are all optional, and, that, given my proposed syntactic framework,
they satisfy my "stricly additive" condition on future 
versions of the protocol. This implies that, providing my 
syntactic framework is adopted, they might 
not be strictly necessary to include in the 1.0 spec.

Alan Babich

Received on Thursday, 11 June 1998 16:07:41 UTC