RE: comments on Requirements Draft of Nov 19, 1997

> ----------
> From: 	Jim Davis[SMTP:jdavis@parc.xerox.com]
> Sent: 	Tuesday, January 06, 1998 4:12 PM
> To: 	Saveen Reddy (Exchange); www-webdav-dasl@w3.org
> Cc: 	w3c-dist-auth@w3.org
> Subject: 	comments on Requirements Draft of Nov 19, 1997
> 
> Saveen this is a terrific beginning of the requirements. I agree with
> your
> overall sense of scope, particularly the things you excluded.
> 
> I must also say that there are a lot of requirements in this document
> whose
> rationale (or perhaps semantics) I do not understand.  Can you
> enlighten me?
> 
> Variants (3.1.4).  What WebDAV group is working on "mechanisms ... to
> use
> when submitting variants to the server"?  I must have missed this.
> 
This is the result of my awkward phrasing. Let me try saying this
another way ... In section 5.10 (entitled "Variants") of the webdav
requirements it says "Detailed requirements for variants will be
developed in a separate document.". So, I understand there is no
separate Variant document at the moment, but I wanted to make explicit
that whatever the model for submitting variants is going to be, DASL
should work with it.

> Regular Expressions (3.1.6) By "must" do you mean that every DASL
> server
> MUST support regex?  If so, why?  This seems too expensive to me.  As
> far
> as I know, the large search engines (e.g. Verity) do not support
> regex.
> 
> Likewise for NEAR (3.1.7)  Again, why is this mandatory?
> 
I believe DASL must *allow* servers to perform such queries (this is
what supporting multiple search syntaxes would allow), and DASL should
provide guidelines to on what it means to do, for instance, a REGEX on a
property values (there are, I believe, some internationalization issues
lurking there). But, I'm not convinced every server MUST those
operations for the minimum DASL capabilities.

> Result Record Definition (3.2.1).  I can certainly see the value of
> supporting this - it improves performance by cutting round trips.
> (otherwise, you do a SEARCH to get the list of resources that match,
> then a
> PROPFIND for each one.)
> But is this the only reason, and should it be mandatory?
> 
The performance in terms of round trips and bytes on the wire are
certianly *the* big factors in pushing this to be a mandatory feature.
As the number of properties set on objects increases, the impact just
gets worse and worse. From my experience with systems that support
properties, asking the for a specific set of properties such a common
operation that systems they are very likely to support this kind of
feature anyway.

> Paged Search Results (3.2.3).  I don't at all see why we need this.
> If the
> search results are returned in chunked Transer Encoding, then the
> search
> engine can start returning results as soon as the first match occurs,
> and
> the client can certainly start displaying them as they arrive.  Or
> perhaps
> I don't know what this means.  I hope it does not mean that the server
> has
> to store the state of searches in progress, as in Z39.50.
> 
Chunked transfer encoding and paged results are related in the sense
that they both break up data over multiple messages but there are some
differences.

I haven't gove through all the scenarios where paged table results are
going to be used, but in general I think we see here that the client
desires a lot of control that chunked does not give:

- The client should be able to specify an exact number of records to be
received. 

- The client might need specify which region of the complete result set
it wants and this could happen in and order. This could be done
completely out-of-order -- "give me the middle of the results, then the
first, then the last". 

- Also, clients may want relative positioning - "give me ten records
starting from the last 80% of the result set".

And you are right, because multiple request messages are involved from
the client concerning the same result set, the server and client need to
have some form of state mechanism.

I don't think paged table results are something that server MUST
support, and I don't think the DASL protocol specification has to define
this (maybe a separate draft, if ever), but whatever a DASL response
looks like, it should be possible at some time in the future to page the
results back out.

Whether paged table results are being used or not, chunked is still
going to be valuable.

> Search Scope 3.3.1 - is a search scope a collection?  Why do we need
> this?
> It's a performance improvement, so one does not have to issue N
> searches? 
> 
Yes it is generally going to be a collection. (I supposed someone could
try to search a single file, but that's certainly not that interesting).
The ability to name multiple collections is (IMO) basically a
performance improvement. 

I think it's a MUST that DASL show how to do "distributed" search -- one
that does span multiple collections. But in the end, this may just be,
as you point out, that client must issue N searches.

> Search Depth 3.3.2 - by "container" do you mean "collection"?
> 
Yes.

> Extensible Query Syntax (3.4.2) - I am leery of this.  Where does this
> requirement come from?  I challenge it in two ways
> 
> 1 - it's not needed, because generic query syntaxes are sufficient.
> Consider, for example, the  DMA (document management alliance) API.
> It
> provides only one syntax, albeit a powerful one, unless by
> "extensible" you
> mean that the client can discover the list of searchable properties
> and
> operators allowed on each one.  For this kind of discovery, you
> should
> look at DMA, which provides means of describing the operators, the
> required
> operands for each, the datatypes supported, the default values, and so
> on.
> I think that RDF is actually expressive enough to express all this.
> 
By "extensible" I meant "discoverable".

> 2 - it's not sufficient - I find it hard to believe that a client C
> can do
> discovery on server S and generate an effective query using a *syntax*
> it
> did not previously know, without the intervention of a human.  If all
> you
> are trying to say is that server S should be allowed to provide
> proprietary
> search interfaces so that client C (from the same vendor) can work
> with it,
> that's neither difficult nor worthy of the spec.
> 
I definitely agree with you that it's pretty darn unlikely for a client
to discover and use any syntax of which it didn't already know. 

Do we both agree that discoverability is worthy of the spec?

> If you have something different in mind, please correct my
> misunderstanding.
> 
> Internationalization 3.7.  I strongly agree. I've certainly run into
> problems in searching against non-ascii data, e.g. names in German or
> Greek.  (Did you think I would have only disagreements?)
> 
> Finally, something needs to be said about full text variants - when a
> document is stored with N variants, it's not clear to me which one(s)
> a
> full text search applies to.  With some indexing systems, you don't
> get to
> choose.
> 
I believe that ideally this works by the criteria on the search. For
example, "give me all the resources that contain the word "Maenner" and
that have  Content-Language = 'DE'. And in this scenario, a system that
cannot qualify its search by the criteria should fail the request.

Actually, Jim, you point out what is probably our most vexing problem
... that DASL is going to have to live with the searching capabilities
of storages, and sometimes capabilities in one storage just are not
going to be there in others.

> Likewise, if versioning is still in WebDAV (or should I say 'WebDA'?)
> then
> there must be some interaction with search.
> 
Same problem ( multipled by 10, yikes! ) here. 

> I look forward to discussing these and become more enlightened.
> 
> PS we should carry followup discussion on www-webdav-dasl only, I
> think.
> 
> 
> 
Thanks,
Saveen

Received on Thursday, 8 January 1998 13:12:45 UTC