- From: <Joel.Crisp@bristol.ac.uk>
- Date: Wed, 27 Nov 1996 16:27:27 +0000 (GMT)
- To: www-jigsaw@w3.org
- Cc: b.kelly@bath.ac.uk
Hi Well, if you want me to produce an IRIX and Jigsaw page I'll be more than happy to ;-) As for the records views stuff, I'll mirror my source tree soon for people to have a look at....with the understanding that all of it is work-in-progress ! **** SEARCHING **** For some time now, I have been working on the following problem : There are good database search engines. There are good WWW search engines. There is not a good WWW *and* Database search engine. Here are some thoughts (disorganised and rambling !) and some suggestions for a possible implementation. Why do we need one ? OO databases are very good for some queries, and Jigsaw is a very interesting OO database. Given the CVS stuff mentioned in Anselm's last reply, it is exactly what we want for our user submission/annotation work. However, OO databases tend to be poor at card-file type data. MIDRIB, my current project (well, the one that pays me !) is a database of medical images. Principally, it consists of a number of image collections from various donors. It is targeted at education, and hence the ability for teachers to make their own arbitary collections is very important. Hence, an OO database sounds really cool. We want complex searching over field restricted data as well tho'. We need complex database admin tools. We do semantic (ULMS) and thesauri expansion. We have had Indexplus thrust apon us as the database to use. ( Fair enough, it recently got a BCS medal for the sparse leaf B-tree varient it uses, and it is blindingly fast at free text searches). I would like MIDRIB to be more than just a card file and arbitary collection system. I want to put hypertext, and my own side project Tutorial Markup Language ( meta HTML-DTD supporting question semantics ) in. So, my ideal system is an OO database with card file capabilities and a sophisticated semantic search engine.... Let us take Jigsaw, add a defined search API to it's resources, and ensure that the search API can hand searches off to underlying databases with high effiency search engines. In addition, let us define a search result mechanism by which we can flexibly but comprehensively report search results from possibly foreign databases. How do we do this ? Well, there are some existing standards out there which we can use. Dublin Core (is it my imagination or does the meta data document on w3c use this as an example. ? ;-) ) is a meta data standard which is evolving. Whois++ gives a cross database search potential. The document 'draft-ietf-asid-whois-schema-00.txt' gives a meta-data over whois++ defined set of templates. WAIS defines some complex result reporting. All of these will prove useful. None of them may be used directly. I have a four layer model for each collection to use (remote sites are just remote collections, a collection is a searchable set of records). It comprises of the following replies to search requests (to be discussed below) : 1) I exist 2) I exist and may contain x many relevent objects 3) I exist and contain x many relevent objects - here is the meta data 4) I exist, here is the hit meta data, and I can respond with the objects in protocol x,y or z Obviously, the TTL and degree of confidence etc needs to be transmitted as well. All replies should also contain the modifications to the search - i.e. the difference between the requested search and what is actually performed. This reply structure allows systems to interoperate at both a very minimal level, and at a level where one unified user interface provides all the search results from a wide range of different resources. Search term description I think needs to be defined very carefully, again with different levels of complience, from simple 'containing this word' to a more complex expression evaluator (I have the skeleton of one in development). I suggest using a text string representation and a common parser core to a tree representation, which may then generate e.g. an SQL query to an underlying database. So we could see something like this : 'Title=~Mobile Code and Network Applets' AND 'JAVA' AND NOT 'JIGSAW'; CaseSensitive=YES; Expand=Synonyms,Stemming ; Hints=NO Search results need to be carefully thought out too - we are reporting results like this (not real syntax!) : Results: 4000 ( best is 80% ) Original Search: 'equine' AND 'heart' ; CaseSensitive=no; Hints=YES Actual search : 'equine' and ( 'CARDIAC' OR 'HEART' ) ; CaseSensitive=NO; Hints=YES Operations performed : Synonyn expansion 'HEART' -> CARDIAC; ULMS_vocablery. Hints : Synonym expansion 'EQUINE' -> HORSE,FOAL,GELDING ; ULMS vocablery, Database OMNI HTTP://omni.ac.uk/ , Semantic net : VASCULAR; ULMS_vocablery Note that Results rating needs to be formalised, as do the other functions such as thesurus lookup etc. Note that I return the source used for lookups as well. Other issues are caching search lookup, mirroring indicies, caching search results, parallel searching etc. I'd like to experiment with adding some of these facilities to the basic Jigsaw Resource class... particularly as it has a lot of the service code already installed ;-) Some of these ideas had been implemented in my 'pre-jigsaw' clone ...now replaced by the real thing ;-) The structure of which had the same 'container' tree look, but also had a link database. Creavat - until recently I was the only person working on this - and part time at that.. Hopefully out of this message comes some interesting discussion.... Apologies for english, spelling, daftness of ideas, duplication of other work as appropriate Joel -- Joel.Crisp@bris.ac.uk | ets-webmaster@bris.ac.uk | "I remember Babylon" - Software Engineer, Institute of Learning and | Arthur C Clarke Research Technology, University of Bristol, UK | http://www.ets.bris.ac.uk/ |
Received on Wednesday, 27 November 1996 11:32:53 UTC