- From: Babich, Alan <ABabich@filenet.com>
- Date: Tue, 30 Jun 1998 15:16:09 -0700
- To: "'DASL'" <www-webdav-dasl@w3.org>
The "contains" CBR operator as currently specified has a few problems. One problem is that it is a required operator. The problem with that is that, is that it raises the barrier of entry way to high for many existing systems. (See "Barrier of entry must be low", 6/30/98.) Point 1: One system I am familiar with is a high volume document imaging system. Billions of dollars worth of this system have been sold, and it is still going strong. The documents are scanned in. There is no text version of the image documents. There is no possibility of content based retrieval unless there is a text based version of the documents. The system has been shipping for eleven years, and there is so little demand for OCR of these documents, that OCR and text based retrieval are still not provided. Content based retrieval is obviously not a significant benefit for applications using this system. Point 2: I believe the situation is the same for other high volume imaging systems. Point 3: Another system I am familiar with manages electronic documents, i.e., documents containing text, typed by humans, that software understands. This system is one of the top few in market share. Content retrieval is provided with this system. However, most customers of this system do not install it. Of those that install it, most don't find it useful. Less than 10% of the customers of this system use content based retrieval. My supposition is that the state of the art of content based retrieval is not yet good enough for the other 90% of the customers. Point 4: I believe that the situation is the same for other EDM systems. Point 5: Consider what it would take for a system to offer content based retrieval if it is not already doing so: Implementing a CBR engine would be a lot of work. Considering the maturity of the CBR industry, implementing another CBR engine at this point in time would be a very poor business decision for almost all companies. Therefore, the company involved would probably look to an existing CBR company to provide this functionality. The first step is to evaluate the candidates. This takes time, money, and effort. The second step is to strike a deal with at least one of them. The third step is to integrate with the CBR software. There would have to be training and education for engineering, documentation, marketing, and support. There is an ongoing support cost and release upgrades. Point 6: We must be inclusive. (See "Barrier of entry must be low", 6/30/98.) Excluding whole classes of such systems from DASL would be a mistake. Point 7: On the other hand, we should not drop full text retrieval functionality out of DASL for 1.0, because it is clearly extremely useful for some applications. This is true whether you believe in having a single poorly defined operator like "contains", a set of crisply defined CBR operators, or both. Conclusion: By now you've figured it out. If the system isn't already supporting CBR, the costs outweigh the benefits, and such a system can not implement it just for DASL. In other words, making "contains" required raises the barrier of entry way too high. Therefore, all CBR operators must be optional. (To be clear: I do NOT include string pattern match operators, e.g., the SQL LIKE operator in the CBR operator category. String pattern match operators are a separate discussion altogether. I will discuss these in a separate e-mail.) Immediate Consequence: An immediate consequence of all CBR operators being optional, is that the query operators supported by a collection must be advertised by the collection. DASL can NOT take the position that all the operators that can and must be implemented are in the DASL 1.0 spec., so they don't have to be advertised. Our charter requires us to advertise query capabilities, so we shouldn't have any reservations about advertising the query operators supported. We will have to do it in later versions of the spec. when additional optional query operators are introduced in any case. It's really easy to do. Alan Babich
Received on Tuesday, 30 June 1998 18:18:53 UTC