- From: Babich, Alan <ABabich@filenet.com>
- Date: Tue, 27 Jan 1998 17:49:58 -0800
- To: "'www-webdav-dasl@w3.org'" <www-webdav-dasl@w3.org>
- Cc: "Babich, Alan" <ABabich@felix.filenet.com>
(1) "3.2.3 Paged Search Results" My intuition on this topic is that this is probably not a good idea. Because there are typically dozens to thousands of clients per server, the bulk of the CPU power and disk space is on the client side of the network. That is where scrolling around in the answer set should happen. Putting it on the server makes the overall architecture inefficient by wasting precious server resources. I don't like inefficient architectures, especially in large distributed systems, because then lots of people suffer all at once. The client must be able to ask for the "next N results" to make one sequential pass through the answer set. The client side can cache the whole thing and allow the user to scroll around in it at will. What first occurs to me is to make the protocol as stateless on the server side as possible. If the server is stateless, server crashes can be made transparent to the client. NFS is that way, for example. When they take the my server down and reboot it for backup, my diskless workstation hangs. When they are all done, my diskless workstation simply resumes without missing a beat. Another advantage of the server being stateless is that it simplifies things. (2) "3.3.1 Search Scope It must be possible for the client to specify a number of different, unrelated URIs over which the search is to range." I agree with subsequent comments in the e-mail thread that these "unrelated URI's" should be actually be related by being on the same server. The client side could use multi-threading to query multiple different servers in parallel. (3) " 3.1.1 Boolean Expressions It must be possible to use Boolean operators (AND, OR, NOT) in the search criteria." Yes. Boolean expressions are a good thing. Personally, I like them a lot. However, traditional Boolean logic has only two truth values. I like the simplicity of that, but when it comes to queries, that probably isn't enough truth values. You probably need three: TRUE, FALSE, and UNKNOWN. Furthermore, if you allow arithmetic expressions or string expressions, you will probably need to add another value for numbers, UNDEFINED, and another value for strings, UNDEFINED. Let me try to motivate this. Suppose you allowed arithmetic expressions in your query conditions. Suppose division was one of the arithmetic operators you defined. Then you might have some condition like "X/Y > 3". Where X and Y are some properties. That's all well and good until you encounter a resource where Y = 0. The way out of this problem is to (1) let X/Y equal the UNDEFINED number you added to the set of all numbers. (2) Define any arithmetic expression with an variable or constant equal to UNDEFINED to evaluate to UNDEFINED. (3) Define the relational operators (>, >=, <, <=, = !=) to evaluate to UNKNOWN if either operand is UNDEFINED. And,(4)define the truth tables for AND, OR, and NOT such that the resulting truth value is UNKNOWN if one of the operands is not well defined AND it matters to the final outcome. Otherwise, the operand that is not well defined doesn't matter to the final outcome, and the truth value is what you would expect. For example, for OR (logical inclusive OR), in "X OR Y", if X is TRUE, you don't care whether Y is well defined or not, you consider the value of "X OR Y" to be TRUE. Similarly, if X were FALSE, the value of "X OR Y" is the value of Y, be it TRUE, FALSE, UNKNOWN. Similarly, for "A AND B", if A is FALSE, the expression evaluates to FALSE regardless of the value of B. If A is TRUE, the expression has the value of B. For the NOT operator, NOT TRUE is FALSE, NOT FALSE is TRUE, and NOT UNDEFINED is UNDEFINED. You might think you avoid this problem if you don't allow arithmetic or string expressions, but that turns out not to be the case. You can not escape the null value problem. Suppose you are querying a set of resources with the condition that "Author=Joe OR Size>10". What about the resources in the set of resources that have no value for the property "Author"? Are they included in the result set or not? With the scheme I have given above, the answer is clear. The above scheme is not new to anyone familiar with SQL or DMA. It is simply ANSI standard SQL three valued logic. I suggest that you probably want to use it. It solves the problems, it is standard, and it is widely adopted. I also suggest that you probably want operators like "A = NULL" or "A IS NULL" to provides client with a tool to help deal with the null value problem more effectively. (4) "3.4.2 Extensible Query Syntax DASL extensions must support the extensible use of alternate query syntax." This is a really interesting topic. I don't have time to go into any depth here, but let me at least say this. It seems to me that any software related thing that is not dead gets enhanced over time, so you had better plan for it. Also, I believe you need to discover what capabilities are provided by the servers, and you need to deal gracefully with capabilities you don't understand. I would observe that three valued logic may be of use in this regard, since we found it to be useful in DMA. For example, if you design it right, you might be able to execute a query and return meaningful result even if there is an operator the server doesn't understand. Of course, you need an invariant way to distinguish operators and group the operands that they apply to in a recursive, i.e., hierarchical, manner. The use of infix, postfix, or prefix notation and some type of bracketing or parentheses is one way to approach this. This overall organization of the syntax that allows parsing out operators and their operands can never be allowed to change. (5) My intuition is that you probably want a simple but somewhat general capabilities mechanism to describe the query capabilities of a server. Alan Babich
Received on Tuesday, 27 January 1998 20:51:26 UTC