Re: Demo script and vocabulary mapping from Kevin Smathers on 2003-09-16 (www-rdf-dspace@w3.org from September 2003)

From: Kevin Smathers <kevin.smathers@hp.com>
Date: Tue, 16 Sep 2003 08:14:36 -0700
To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
Cc: www-rdf-dspace@w3.org
Message-ID: <3F6728DC.3040102@hp.com>
Hi Mark,

I've been working on this problem for the last couple of days, and think 
I have a partial solution.

Fundamentally I think we need search index objects to index the 
collection along interesting facets.   An index object could be 
represented in RDF as simply as:

<#index> <#index-term-1> <#document1>  ;
                 <#index-term-1> <#documentN> ;
                 <#index-term-2> <#documentN+1> ;
                 <#index-term-M> <#documentM> .

What is crucial isn't the structure of the Index, but that we have to 
construct the Index as a distinct object from the collection metadata.  
That offers a chance for optimizing the index, and it allows multiple 
literals to be collected under a single index-term if needed.  In your 
example we could either index 'Wright', 'Frank', and 'Lloyd' (ie 
tokenize the strings) to do keyword search, or we can put the string 
literals into a canonical format (e.g. "Wright, Frank L.").

<#nameIndex> [ <#indexedName> "Wright, Frank L." ] <#vra-document>, 
<#ims-document> ;

Then whether a record appears in a particular index is dependent on 
whether there is a mapping for the literal value of that field into the 
index term. 

Further, I propose that we do the same thing with restricted 
vocabularies.  I think a Vocabulary is essentially a subclass of Index 
with additional methods for navigation (up/down) in addition to the 
enumeration and counting methods of Index.  The Vocabulary terms may be 
able to be external from the collection itself, but the mapping from 
collection objects to Vocabulary terms is something that I think needs 
to live in its own object.  The alternative of mapping vocabulary terms 
directly into the collection objects will tend to pollute the collection 
object metadata with lots of apparently duplicate metadata, with no 
corresponding object to represent the lifecycle of that duplicated data, 
and little opportunity for optimization of the indexing function.

<#document> vra:PersonalName "Wright, Frank L. (1867-1959)" ;
                        <#indexedName> "Wright, Frank L." .


Should I write this up more fully?

Cheers,
-kls

Please excuse my N3, no doubt it is particularly ugly.

Butler, Mark wrote:

>Hi team,
>
>I have been thinking about the examples from the demo script. After looking
>at the examples, I'm not sure if mapping between vocabularies is the ideal
>solution? 
>
>An alternative approach would be similiar to FRBR as outlined here
>
>http://www.oclc.org/research/projects/frbr/default.htm
>http://www.oclc.org/research/projects/frbr/algorithm.htm
>
>FRBR means reorganizing a collection to better reflect its conceptual
>structure. For details, see the OCLC page but very roughly this involves
>sorting records into three groups: 
>
>Group 1 consists of the products of intellectual or artistic endeavor (e.g.,
>publications). 
>Group 2 comprises those entities responsible for intellectual or artistic
>content (a person or corporate body). 
>Group 3 includes the entities that serve as subjects of intellectual or
>artistic endeavor (concept, object, event, and place). 
>
>So what we really want to do when mapping between IMS and VRA is
>1. First identify all the example records are about Frank Lloyd Wright,
>group them together and place in group 1 i.e. the example metadata in the
>demo script. 
>2. Extract some information about FLW e.g. populate a group 2 item. 
>
>then if we search for "Frank Lloyd Wright" then we get all three records
>because they have all been grouped together. Alternatively if we search for
>"20th century designers", then from the group 2 item we determine that FLW
>is a designer, and based on the sheer number of records about FLW in the
>content databases determine that he is important, then use the FLW search
>term to return the IMS and VRA records?
>
>So instead of doing some mapping at the query stage, the important bit is
>doing the FRBR restructuring at the beginning. Once we've established the
>relationships, we can use different "viewers" for the IMS and VRA records,
>so in a way the vocabulary mapping doesn't matter. What is harder is how we
>map 
>
>vra.Creator.Personal Name=Wright, Frank L. (1867-1959)
>
>onto
>
>ims.general.title = Frank Lloyd Wright
>
>Any comments?
>
>Dr Mark H. Butler
>Research Scientist                HP Labs Bristol
>mark-h_butler@hp.com
>Internet: http://www-uk.hpl.hp.com/people/marbut/
>
>
>
>  
>


-- 
========================================================
   Kevin Smathers                kevin.smathers@hp.com    
   Hewlett-Packard               kevin@ank.com            
   Palo Alto Research Lab                                 
   1501 Page Mill Rd.            650-857-4477 work        
   M/S 1135                      650-852-8186 fax         
   Palo Alto, CA 94304           510-247-1031 home        
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");
Received on Tuesday, 16 September 2003 11:14:48 UTC