Type checking in DASL

This memo describes one approach to introduce 
syntactic type checking into DASL without relying
on XML extensions. Saveen has already pointed
out that there is a proposed extension
to XML for typing. It would extend the XML 
language by having new constructs such as 

    <elementType id="author">
        <string/>
    </elementType>

to define an "author" conceptual type, the value 
of which is expressed as a string (i.e., the
datatype of "author" is "string").

In this memo, I just focus on the query condition.

This proposal attempts to just use the existing
<!ELEMENT> declaration syntax, so that we don't
have any dependency on the XML extension proposal.
This proposal turns query conditions with type
mismatches into malformed XML documents. This proposal 
will continue to work regardless of the outcome of the
proposal for XML datatype extensions.

ITEM 1:
I postulate small, fixed set of interesting base
datatypes. In this memo, I assume Boolean, integer,
floating point, strings, and datetimes are the
interesting base datatypes for sake of discussion.

NOTE: Even though I think we should represent
dates as strings in ISO 8061 format using the Zulu
time zone, I made datetimes their own datatype,
so that we can have different operators on them,
e.g., "today's date", etc., and so that you know
you want to run the date string through a format
routine before displaying it, etc.  The fact
that SQL also makes datetimes their own type
is worthy of note, and bolsters this argument. END NOTE

ITEM 2:
There are at most three things that can return a
value of a given datatype: (1) a property,
(2) a literal, and (3) an operator. Therefore, there is
an element type definition for expressions of each
base datatype, and the alternatives of this element
definition are (1) properties with that datatype,
(2) literals of that datatype, and (3) operators
returning that datatype (unless no operators
are currently defined that return that datatype).

ITEM 3:
The "where" clause consists of an operator that 
returns a Boolean value.

ITEM 4:
To add a new operator in the future that returns results
of datatype x, (1) make an element definition, x_op
for all operators that return values of datatype x
if there isn't one already,
(2) make an element definition for the new operator
to show what its operands need to be, and
add it to list of the alternatives of x_op,
and (3) if there isn't already an alternative for x_op
in the element definition of x_expr, add x_op
to its list of alternatives. The x_op and x_expr
forms are just naming conventions, and have no other
significance. In particular, they are not
parsed to discover "x".

Just for the heck of it, I've included a string
length operator, "strlen" that takes a string
argument and returns an integer, and a floating
point addition operator, just to illustrate
how the model extends -- not to propose them
as part of the minimal required operator set.


Here are the definitions for the query "where" condition:

<!ELEMENT where ( boolean_op ) >

<!-- These are all the operators that return a Boolean value -->
<!ELEMENT boolean_op ( and | or | not | 
                       gt_integer | ge_integer | 
                       eq_integer | ne_integer | 
                       ls_integer | le_integer |
                       gt_float | ge_float |
                       eq_float | ne_float |
                       ls_float | le_float |
                       gt_string | ge_string |
                       eq_string | nq_string |
                       ls_string | le_string |
                       gt_datetime | ge_datetime |
                       eq_datetime | ne_datetime |
                       ls_datetime | le_datetime |
                       contains ) >

<!ELEMENT and ( boolean_expr , boolean_expr+ ) >
<!ELEMENT or ( boolean_expr , boolean_expr+ ) >
<!ELEMENT not ( boolean_expr ) >
<!ELEMENT gt_integer ( int_expr , int_expr ) >
<!ELEMENT ge_integer ( int_expr , int_expr ) >
<!ELEMENT eq_integer ( int_expr , int_expr ) >
<!ELEMENT ls_integer ( int_expr , int_expr ) >
<!ELEMENT le_integer ( int_expr , int_expr ) >
<!ELEMENT gt_float ( float_expr , float_expr ) >
<!ELEMENT ge_float ( float_expr , float_expr ) >
<!ELEMENT eq_float ( float_expr , float_expr ) >
<!ELEMENT ne_float ( float_expr , float_expr ) >
<!ELEMENT ls_float ( float_expr , float_expr ) >
<!ELEMENT le_float ( float_expr , float_expr ) >
<!ELEMENT gt_string ( string_expr , string_expr ) >
<!ELEMENT ge_string ( string_expr , string_expr ) >
<!ELEMENT eq_string ( string_expr , string_expr ) >
<!ELEMENT ne_string ( string_expr , string_expr ) >
<!ELEMENT ls_string ( string_expr , string_expr ) >
<!ELEMENT le_string ( string_expr , string_expr ) >
<!ELEMENT gt_datetime ( datetime_expr , datetime_expr ) >
<!ELEMENT ge_datetime ( datetime_expr , datetime_expr ) >
<!ELEMENT eq_datetime ( datetime_expr , datetime_expr ) >
<!ELEMENT ne_datetime ( datetime_expr , datetime_expr ) >
<!ELEMENT ls_datetime ( datetime_expr , datetime_expr ) >
<!ELEMENT le_datetime ( datetime_expr , datetime_expr ) >
<!ELEMENT contains ( prop? , #PCDATA ) >

<!-- These are all the operators that return an integer value -->
<!ELEMENT int_op ( strlen ) >

<!ELEMENT strlen ( #PCDATA ) >

<!-- These are all the operators that return a floating point value -->
<!ELEMENT float_op ( add_float ) >

<!ELEMENT add_float ( float_expr , float_expr+ ) >

<!-- No operators are currently defined that return string
     or datetime values.
-->

<!-- The following elements define all Boolean valued expressions -->
<!ELEMENT boolean_expr ( boolean_op | boolean_prop | boolean_const ) >
<!ELEMENT boolean_prop ( prop ) >
<!ELEMENT boolean_const ( TRUE , FALSE, UNKNOWN ) >

<!-- The following elements define all integer valued expressions -->
<!ELEMENT int_expr ( int_op | int_prop | int_const ) >
<!ELEMENT int_prop ( prop )
<!ELEMENT int_const ( #PCDATA )

<!-- The following elements define all floating point valued expressions
-->
<!ELEMENT float_expr ( float_op | float_prop | float_const ) >
<!ELEMENT float_prop ( prop ) >
<!ELEMENT float_const ( #PCDATA ) >

<!-- The following elements define all string valued expressions -->
<!ELEMENT string_expr ( string_prop | string_const ) >
<!ELEMENT string_prop ( prop ) >
<!ELEMENT string_const ( #PCDATA ) >

<!-- The following elements define all datetime valued expressions -->
<!ELEMENT datetime_expr ( datetime_prop | datetime_const ) >
<!ELEMENT datetime_prop ( prop ) >
<!ELEMENT datetime_const ( #PCDATA ) >


As an example, the query condition

      ( loan_processor = "Sam" AND loan_amount > 100000 ) OR
      loan_risk_level > 3

We would have a parse tree that looks like this:

    OR
        AND
            =
                loan_processor
                "Sam"
            >
                loan_amount
                100000
        >
            loan_risk_level
            3

and XML that looks like this:

<where>
    <boolean_op>
        <or>
            <boolean_op>
                <and>
                    <boolean_op>
                        <eq_string>
                            <string_prop>
                                loan_processor
                            </sring_prop>
                            <string_const>Sam</string_const>
                        </eq_string>
                    </boolean_op>
                    <boolean_op>
                        <gt_integer>
                            <int_prop>
                                loan_amount
                            </int_prop>
                            <int_const>1000000</int_const>
                        </gt_integer>
                    </boolean_op>
                </and>
            </boolean_op>
            <boolean_op>
                <gt_integer>
                    <int_prop>
                        loan_risk_level
                    </int_prop>
                    <int_const>3</int_const>
                </gt_integer>
            </boolean_op>
        </or>
    </boolean_op>
</where>      

I'm not familiar enough with XML to know whether
we can improve on the definition of properties or constant
types. At the bottom level I just used "prop", 
which is defined somewhere else,
and #PCDATA, which seems like it needs further
syntactic definition.

What do you think?

I'll be out of town at least until Thursday, 4/23/98,
and I'm already behind on my e-mail. I haven't even
read all the DASL e-mail yet. So, I won't be able
to respond before 4/23 to comments.

Alan Babich

Received on Friday, 17 April 1998 20:41:55 UTC