- From: Jim Davis <jrd3@alum.mit.edu>
- Date: Tue, 26 Oct 1999 16:05:26 +0200
- To: Niket Patwardhan <niket@verity.com>, www-webdav-dasl@w3.org
At 09:37 AM 9/16/99 -0700, Niket Patwardhan wrote: >I have been reading the spec, and the discussion since. Here are a few >things that need to be taken care of:- Thanks for your comments. Sorry to be so slow to address them. >1) I think the spec should say something about the language (like English, >French, German, - SQL, VerityQL) of the query, at least for servers that >support content-based retrieval(CBR?). The only thing I can find is 5.6.1 >and 6. Are you asking for further editorial material (better explanations, examples) or additional functionality in the protocol itself? If the latter, are you asking for this for queries against property values or contents (5.13)? I would suppose you mean contents, since you cited 5.6.1 yourself. In this case, it would be very interesting if you made a concrete proposal for what you think the protocol should do. If you do make such a proposal it would be good to address two points: 1) What happens if the client specifies a language, and the server does not know the language of the content document? Does the query fail? 2) It would also be useful to know if any existing implementation is actually able to use this information, and how. For example, if I have a document in Microsoft Word, I can certainly tag words with a language, so that Word could (in theory) distinguish French "chat" from English "chat". But if I then index this document with eg. Verity, is the information preserved? As you probably know, the IETF requires there be working implementations of all draft protocols, so there's no use defining a behavior that no one has actually implemented. It would also be helpful to get some guidance on the following question, which has plagued us sorely: How can we meaningfully compare strings in two very closely related languages, for example EN_us and EN_gb (or whatever it is one calls English in the English Isles). While a "lift" in the UK may have a different meaning than a "lift" in the USA, it still seems bad to fail the query. > I think that in 5.13 we >should at least require that if they invoke the "CONTAINS" operator, they >must have specified the natural language the client is using in his query. So if no language is specified, the query *fails*? What is the advantage of this? >Much of the useful value add of a text based search comes from >things like:- > >Case In/sensitivity >Phonemic In/sensitivity >Stemming >Thesaurii >Part of speech analysis I fully agree that much value comes from these features. Let me ask you though, if DASL 1.0 did not have them, would it be useless? We deliberately did not define them because 1) No one on the design team had the expertise to create a specification that was both simple, clear,and interoperable (remember it has to work the same way on two distinct implementations) 2) DASL must be easy to implement. We knew that the very top of the line systems (such as Verity) supported such features, but we reasoned that it was better to leave room for expansion and/or vendor-specific extensions. We think features like stemming and thesaurii are better treated in this way. >2) To prevent buffer overflows don't you want to say "The server MUST honor >the limit" in either 5.14 or 5.15? How would this help? For those clients with fixed size buffers, the amount of data in even a single record is difficult to predict. (A property value could be arbitrarily large). And as far as I know, all reasonable clients are able to handle arbitrarily large replies. And if they can't, they can always close the connection. If we say the server MUST honor the limit, it just raises the barrier of implementation. >3) 5.16 is too limiting in that it allows only two values for case >sensitivity (0 or 1). In real life, things are much more complicated - one >could choose to ignore accents also for example. It would be very excellent if someone with expertise (such as yourself) could provide a list of all the sensible values. Or perhaps you can tell us whether the definition, as it is, is worse than no definition at all. None of the editors have the expertise to decide clearly. >4) The note in 5.17 should refer to query results, not queries, since that >is what the score is associated with. Thanks, I'll make this change in the next version. >Also, I hope there is some indication >somewhere about where the result came from, so that scores can be compared >if it is valid to do so. I don't understand. The client knows where it sent the query to. What other information did you have in mind? best regards Jim Davis jrd3@alum.mit.edu http://users.lanminds.com/contact/jdavis
Received on Tuesday, 26 October 1999 12:08:26 UTC