Three Valued Logic and Elimination Defined

I don't think all the readers on the DASL mailing list
are up to the same level of understanding of three 
valued logic and three valued elimination. Therefore,
I have written this e-mail to explain what they are.
Sorry, but that takes more than a few words. If you 
already fully understand three valued logic and three 
valued elimination, you can skip this e-mail (<180 lines). 
No proposals are discussed. No reply is necessary.


THREE VALUED LOGIC

Imagine you are the implementor of an ad hoc query
program that runs against a DBMS of any type. Let's
assume an RDBMS to make the exposition more concrete.
Your query program obviously can not blow up no
matter what row it is given to evaluate. 

Problem 1: You will discover that you have to deal with 
columns in the row under scan being null. For 
example, if the query condition is ( Author = "Smith" ),
what would you do if the current row under scan
does not have a value for the "Author" column
(i.e., the Author column is null)?

Problem 2: You will also discover that you have to deal 
with undefined expressions, e.g., division by zero, the
square root or logarithm of a negative number, floating
point number overflow, etc. For example, 
suppose the query condition is ( X / Y > Z ).
What do you do if X, Y, Z are all non null, but the value
of Y is zero? Obviously you can't take a divide by
zero instruction fault and blow up your program.

These are generic problems for any ad hoc query program.
In particular, they are generic problems for ad hoc
SQL query programs. So, the ANSI standard SQL committee
has defined three valued logic to deal with such problems.

If you have composed a SQL query, YOU HAVE USED three valued 
logic. Most people DON'T REALIZE THIS. Three valued logic is 
intuitive enough that it doesn't normally intrude.

These are the rules of three valued logic:

(1) For numbers: If a column in the arithmetic expression
is NULL, its value is treated as undefined. If any undefined
operation occurs (e.g., division by zero), the value of
that operation is treated as undefined. If any operand of
an arithmetic operator is undefined, the result is undefined.
Therefore, if any subpart of an arithmetic expression is 
undefined, the entire arithmetic expression is treated as 
undefined. Note that "undefined" is a completely different
concept from "null". For example, assuming X is not null,
X/0 is not defined but is not null either.

(2) For datatypes other than numbers, the same approach is
taken as for numbers: Any undefined subexpression 
contaminates the whole expression and causes it to be 
undefined. 

(3) For the relational operators >, >=, <, <=, =, !=,
if either or both operands are undefined, the result is
the truth value UNKNOWN. Note that UNKNOWN is a completely
different concept than undefined. UNKNOW is a well known
truth value, and, is in fact, very precisely defined.

(4) The logical operators, AND, OR, and NOT are evaluated 
as follows: The three truth values, TRUE, FALSE, and
UNKNOWN. If any operand determines the value of the 
expression, then return that value, because you don't care 
about the values of the other operands. Otherwise, cancel
the operands that don't matter, and the value is determined
by what's left, if anything.

Thus, for AND, if any operand is FALSE,
you know that the value is FALSE. Otherwise, cancel the
operands that are TRUE. If you cancel them all, the value
is TRUE, Otherwise, the value is UNKNOWN (no operands are
FALSE, and at least one operand is UNKNOWN). 

For OR, if any operand is TRUE, you know that the value is 
TRUE. Otherwise, cancel all operands that are FALSE. If 
they're all FALSE, the value is FALSE. Otherwise, the value 
is UNKNOWN (no operand is TRUE, and at least one operand is
FALSE). 

For NOT, NOT TRUE is FALSE, NOT FALSE is TRUE, and 
NOT UNKNOWN is still UNKWNOWN.

NOTE: SQL doesn't have a Boolean datatype. So, it doesn't
really have literals for truth values. (Although you can
test whether a Boolean expression "IS TRUE", "IS FALSE", or
"IS UNDEFINED" in SQL 92). One way to materialize the Boolean
truth value UNKNOWN in SQL is to choose a numeric column, X,
and write ( X / 0 = X / 0 ).

(5) The current row under scan is included in the
result set if and only if the query condition evaluates to
TRUE. (Thus, if the result is FALSE or UNKNOWN, the row under
scan is not included in the result set.)


THREE VALUED ELMINATION

Suppose you are given the task of writing software to query
across a set of disjoint heterogeneous collections. The 
collections support somewhat different sets of properties 
and operators. Yet, you wish to treat querying across the 
set of collections the same way as querying a single 
collection.

In general, therefore, a query may be partially undefined
relative to each collection in the set of collections. 
Let us supposed that the collections are picky -- they give
a fatal error to a query if any property or operator is
not defined in their metadata. Yet, you want queries on
the set of collections to return reasonable results, and
not run afoul of the errors returned by an individual 
collection if something is not defined relative to its 
metadata.

What do you do?

First, you advertise a merged set of metadata to represent
the set of collections: You take the set union of the 
properties and the operators supported by the individual
collections and advertise that to be the metadata of the set
of collections.

Second, when you are given a query that is valid against the
merged metadata, you make a (possibly) massaged copy of the
query and send it to each collection. (If the query is not 
valid against the merged metadata, you give a fatal error.)
Then you merge the results, and present the merged results 
as a single set of results. In order to produce the massaged 
copies of the query, you use three valued elimination (which
is an obvious extension of three valued logic) as follows.

Suppose you have collections A and B. Collection A has
properties Author and loan_number, and the query 
operators in the draft except for "contains". Collection B 
has properties Author and purchase_order_number, and all 
the query operators in the draft, including "contains".

Then, the merged metadata is the properties Author, 
loan_number, and purchase_order_number, and all the query
operators in the draft, including "contains".

Suppose the query condition is 
    Author = "Smith" OR loan_number = 12345678

Then the massaged query condition for collection A is
    Author = "Smith" OR loan_number = 12345678
and the massaged query condition for collection B is
    Author = "Smith" OR UNKNOWN

Thus, three valued elimination eliminates the
parts of the query that are undefined relative to a
particular collection by replacing the undefined
Boolean subexpressions by the truth value UNKNOWN.

In this example, collection A will return all resources
whose Author is "Smith" or whose loan_number is 12345678,
and collection B will return all resources whose
Author is "Smith", since it has no resources with
the loan_number property. The two answer sets will be
merged into one answer set.

If the query condition were
    Author = "Smith" OR contains("blue moon")
the query passed to collection A would be
    Author = "Smith" OR UNKNOWN
and the query passed to collection B would be
    Author = "Smith" OR contains("blue moon")

Alan Babich

Received on Wednesday, 1 July 1998 19:48:00 UTC