Re: Model question from Alan Kent on 2003-07-25 (www-zig@w3.org from July 2003)

From: Alan Kent <ajk@mds.rmit.edu.au>
Date: Fri, 25 Jul 2003 10:38:46 +1000
To: Kevin Gamiel <kgamiel@cnidr.org>
Cc: www-zig@w3.org
Message-ID: <20030725103846.E27588@io.mds.rmit.edu.au>
On Thu, Jul 24, 2003 at 09:22:45AM -0400, Kevin Gamiel wrote:
> The search service provides a method for applying a query expression 
> "directly" against a set of databases (DB), generating a logical, 
> server-side result set (RS).  The present service provides a method for 
> retrieval of records from a RS.  The RS is *not* modelled directly as a 
> DB, that is, the search service may not use an RS directly as if it were 
> a DB.  However, type-1 and type-101 queries (at least), allow one to 
> include an RS reference as an operand, *effectively* turning an RS into 
> a DB, though with some (minor?) loss of functionality.
> 
> Was including an RS as a query operand a clever afterthought to avoid 
> changing the underlying data model, or is there a more significant 
> reason to maintain the DB/RS model distinction?

I am not sure to the original reasoning, but a few observations in case
they are useful.

To me result sets are not databases - they identify a subset of records
in a database. I can combine sets with AND, OR, and NOT operators, and
still have references to the same underlying records. There are no-dedup
issues for example.

In SQL, the world is different. Queries do in effect create new logical
databases. They can change the structure of records (the SELECT line,
joins, etc). So in SQL it makes sense to be able to treat query
results (in a view) as a database to be queried. But the records
are different than the original database. You cannot then update
the result of the query to change the original database. 

Further, database names are used for more than doing queries in Z39.50.
You can scan indexes given a database name. This gets harder on a result
set as you should only return terms used in that result set.
(We support this by the way by allowing 'special' result sets to
be created that limit the view of records in the current database - but
this is a limiting filter on the existing database, not a new database.)

Records when returned in Z39.50 also identify the database they came
from. I guess you could return the result set name instead, but this
may be less useful when searching across three databases to form your
result sets. You might want to know which source database is the most
useful for followup searches.

I guess there may be some history in it too. Originally people did searches
by typing in one word at a time, seeing how many matches there were, then
combining the sets created for each word to try and come up with a good
final set of results to look at.

In summary, I think the result set model for Z39.50 fits in with its purpose
of searching for interesting items of data in a collection of records.
The result set model I think is less useful for query languages that
are designed to return information massaged/sythensized as the result
of a query (such as SQL with joins and the SELECT line).

Alan
Received on Thursday, 24 July 2003 20:38:56 UTC