- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Tue, 16 Sep 2003 08:14:36 -0700
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: www-rdf-dspace@w3.org
Hi Mark, I've been working on this problem for the last couple of days, and think I have a partial solution. Fundamentally I think we need search index objects to index the collection along interesting facets. An index object could be represented in RDF as simply as: <#index> <#index-term-1> <#document1> ; <#index-term-1> <#documentN> ; <#index-term-2> <#documentN+1> ; <#index-term-M> <#documentM> . What is crucial isn't the structure of the Index, but that we have to construct the Index as a distinct object from the collection metadata. That offers a chance for optimizing the index, and it allows multiple literals to be collected under a single index-term if needed. In your example we could either index 'Wright', 'Frank', and 'Lloyd' (ie tokenize the strings) to do keyword search, or we can put the string literals into a canonical format (e.g. "Wright, Frank L."). <#nameIndex> [ <#indexedName> "Wright, Frank L." ] <#vra-document>, <#ims-document> ; Then whether a record appears in a particular index is dependent on whether there is a mapping for the literal value of that field into the index term. Further, I propose that we do the same thing with restricted vocabularies. I think a Vocabulary is essentially a subclass of Index with additional methods for navigation (up/down) in addition to the enumeration and counting methods of Index. The Vocabulary terms may be able to be external from the collection itself, but the mapping from collection objects to Vocabulary terms is something that I think needs to live in its own object. The alternative of mapping vocabulary terms directly into the collection objects will tend to pollute the collection object metadata with lots of apparently duplicate metadata, with no corresponding object to represent the lifecycle of that duplicated data, and little opportunity for optimization of the indexing function. <#document> vra:PersonalName "Wright, Frank L. (1867-1959)" ; <#indexedName> "Wright, Frank L." . Should I write this up more fully? Cheers, -kls Please excuse my N3, no doubt it is particularly ugly. Butler, Mark wrote: >Hi team, > >I have been thinking about the examples from the demo script. After looking >at the examples, I'm not sure if mapping between vocabularies is the ideal >solution? > >An alternative approach would be similiar to FRBR as outlined here > >http://www.oclc.org/research/projects/frbr/default.htm >http://www.oclc.org/research/projects/frbr/algorithm.htm > >FRBR means reorganizing a collection to better reflect its conceptual >structure. For details, see the OCLC page but very roughly this involves >sorting records into three groups: > >Group 1 consists of the products of intellectual or artistic endeavor (e.g., >publications). >Group 2 comprises those entities responsible for intellectual or artistic >content (a person or corporate body). >Group 3 includes the entities that serve as subjects of intellectual or >artistic endeavor (concept, object, event, and place). > >So what we really want to do when mapping between IMS and VRA is >1. First identify all the example records are about Frank Lloyd Wright, >group them together and place in group 1 i.e. the example metadata in the >demo script. >2. Extract some information about FLW e.g. populate a group 2 item. > >then if we search for "Frank Lloyd Wright" then we get all three records >because they have all been grouped together. Alternatively if we search for >"20th century designers", then from the group 2 item we determine that FLW >is a designer, and based on the sheer number of records about FLW in the >content databases determine that he is important, then use the FLW search >term to return the IMS and VRA records? > >So instead of doing some mapping at the query stage, the important bit is >doing the FRBR restructuring at the beginning. Once we've established the >relationships, we can use different "viewers" for the IMS and VRA records, >so in a way the vocabulary mapping doesn't matter. What is harder is how we >map > >vra.Creator.Personal Name=Wright, Frank L. (1867-1959) > >onto > >ims.general.title = Frank Lloyd Wright > >Any comments? > >Dr Mark H. Butler >Research Scientist HP Labs Bristol >mark-h_butler@hp.com >Internet: http://www-uk.hpl.hp.com/people/marbut/ > > > > > -- ======================================================== Kevin Smathers kevin.smathers@hp.com Hewlett-Packard kevin@ank.com Palo Alto Research Lab 1501 Page Mill Rd. 650-857-4477 work M/S 1135 650-852-8186 fax Palo Alto, CA 94304 510-247-1031 home ======================================================== use "Standard::Disclaimer"; carp("This message was printed on 100% recycled bits.");
Received on Tuesday, 16 September 2003 11:14:48 UTC