Comments on requirements

(1) "3.2.3 Paged Search Results"

My intuition on this topic is that this is probably not a good
idea. Because there are typically dozens to thousands of clients
per server, the bulk of the CPU power and disk space is on the
client side of the network. That is where scrolling around in the
answer set should happen. Putting it on the server makes
the overall architecture inefficient by wasting precious
server resources. I don't like inefficient architectures,
especially in large distributed systems, because then
lots of people suffer all at once. 

The client must be able to ask for the "next N results"
to make one sequential pass through the answer set. The
client side can cache the whole thing and allow the user
to scroll around in it at will. 

What first occurs to me is to make
the protocol as stateless on the server side as possible.
If the server is stateless, server crashes can be made
transparent to the client. NFS is that way, for example.
When they take the my server down and reboot it for backup,
my diskless workstation hangs. When they are all done,
my diskless workstation simply resumes without
missing a beat. 

Another advantage of the server being stateless is that
it simplifies things.

(2) "3.3.1 Search Scope
     It must be possible for the client to specify a number of
     different, unrelated URIs over which the search is to range."

I agree with subsequent comments in the e-mail thread
that these "unrelated URI's" should be actually be related
by being on the same server. The client side could use
multi-threading to query multiple different servers
in parallel.

(3) " 3.1.1 Boolean Expressions
     It must be possible to use Boolean operators
     (AND, OR, NOT) in the search criteria."

Yes. Boolean expressions are a good thing. Personally, I like
them a lot. However, traditional Boolean logic has only two
truth values. I like the simplicity of that, but when it comes to
queries, that probably isn't enough truth values. You probably
need three: TRUE, FALSE, and UNKNOWN. Furthermore, if you allow
arithmetic expressions or string expressions, you will probably
need to add another value for numbers, UNDEFINED, and another
value for strings, UNDEFINED. Let me try to motivate this.

Suppose you allowed arithmetic expressions in your query conditions.
Suppose division was one of the arithmetic operators you
defined. Then you might have some condition like "X/Y > 3".
Where X and Y are some properties. That's all well and good
until you encounter a resource where Y = 0. The way out of this
problem is to (1) let X/Y equal the UNDEFINED number you added
to the set of all numbers. (2) Define any arithmetic expression
with an variable or constant equal to UNDEFINED to evaluate
to UNDEFINED. (3) Define the relational operators
(>, >=, <, <=, = !=) to evaluate to UNKNOWN if either operand
is UNDEFINED. And,(4)define the truth tables
for AND, OR, and NOT such that the resulting truth
value is UNKNOWN if one of the operands is not well defined
AND it matters to the final outcome. Otherwise, the operand
that is not well defined doesn't matter to the final outcome,
and the truth value is what you would expect. For example,
for OR (logical inclusive OR), in "X OR Y", if X is TRUE,
you don't care whether Y is well defined or not, you 
consider the value of "X OR Y" to be TRUE. Similarly, if
X were FALSE, the value of "X OR Y" is the value of Y,
be it TRUE, FALSE, UNKNOWN. Similarly, for "A AND B",
if A is FALSE, the expression evaluates to FALSE regardless
of the value of B. If A is TRUE, the expression has the
value of B. For the NOT operator, NOT TRUE is FALSE,
NOT FALSE is TRUE, and NOT UNDEFINED is UNDEFINED.

You might think you avoid this problem if you don't allow
arithmetic or string expressions, but that turns out not
to be the case. You can not escape the null value
problem. Suppose you are querying a set of resources
with the condition that "Author=Joe OR Size>10".
What about the resources in the set of resources that 
have no value for the property "Author"? Are they
included in the result set or not? With the scheme
I have given above, the answer is clear.

The above scheme is not new to anyone familiar with
SQL or DMA. It is simply ANSI standard SQL three valued
logic. I suggest that you probably want to use it. 
It solves the problems, it is standard, and it is widely
adopted.

I also suggest that you probably want operators like
"A = NULL" or "A IS NULL" to provides client with
a tool to help deal with the null value problem more
effectively.

(4) "3.4.2 Extensible Query Syntax
     DASL extensions must support the extensible use of
     alternate query syntax."

This is a really interesting topic. I don't have time to
go into any depth here, but let me at least say this.
It seems to me that any software related thing that
is not dead gets enhanced over time, so you had better
plan for it. Also, I believe you need to discover what
capabilities are provided by the servers, and you need
to deal gracefully with capabilities you don't understand.
I would observe that three valued logic may be of use
in this regard, since we found it to be useful in DMA.
For example, if you design it right, you might be able to
execute a query and return meaningful result even if
there is an operator the server doesn't understand.
Of course, you need an invariant way to distinguish
operators and group the operands that they apply to
in a recursive, i.e., hierarchical, manner. The use
of infix, postfix, or prefix notation and some type
of bracketing or parentheses is one way to approach
this. This overall organization of the syntax that
allows parsing out operators and their operands
can never be allowed to change.

(5) My intuition is that you probably want a simple
but somewhat general capabilities mechanism to describe
the query capabilities of a server.

Alan Babich

Received on Tuesday, 27 January 1998 20:51:26 UTC