RE: Demo script and vocabulary mapping

Hi team,

First thanks Kevin for suggestions. Yes more detail would help. 

I've been trying to brainstorm around these concepts - feedback or other
suggestions are welcome?

1: MAP VOCABULARIES USING INFERENCE AT QUERY TIME

Here we use schemas to map between different properties and classes in
different vocabularies at query time. This is the proposal that SIMILE has
been considering. One of the points of this email is to enumerate some other
approaches.

2: MAP VOCABULARIES USING INFERENCE IN ADVANCE

This is similar to 1, only we save the inference information back to the
repository to reduce query time. Whether this is practical depends on the
time / memory tradeoff e.g. do we gain an advantage in time which it is
worth sacrificing the memory required to cache the inference information. 

3: MAPPING VOCABULARIES TO FRBR GROUPS

Based on the FRBR concepts, one way to think of FRBR is like a highly
simplified classification system with just 3 types of properties e.g. we can
map other vocabularies on to these three properties: 

group 1 = dc:title, dc:identifier
group 2 = dc:publisher, dc:creator, dc:contributor
group 3 = dc:description, dc:relation, dc:subject, dc:coverage

Note we don't have to map all the terms into a vocabulary, some terms (for
example technical metadata) won't map at all. 

How do we use this and what advantage does it provide over free text search?
Well we can:
- use authority files to automatically identify group 2 items or clean-up
manually identified group 2 items
- use thesauri to identify group 3 items and map between synonyms
- I suspect it's harder to come up with automatic ways of dealing with group
1 items as they is going to be very little repetition in comparison to group
2 or 3. If we are dealing with digital objects then as David Karger has
noted we could use hashing to come up with a unique identifer for an object
and then merge the various other group 1 descriptors used for it to create a
synonym description. However this requires that we either determine which
properties supply group 1 information in a given vocabulary, or we require
the user to manually identify it. 

4: EXTRACTING AND TAGGING FRBR GROUPS IN METADATA

However this is based on the assumption that each property maps onto 1 and
only 1 group. This is probably not true, for example consider a "Frank Lloyd
Wright: A Biography". This is a reference to a group 1 surrogate, but
contains a reference to a group 2 surrogate. Therefore an alternative is to
think about them as data-types rather than properties e.g.

vra.creator.personal name = <group2>Frank Lloyd Wright</group2>

dc.title = <group1>A biography of <group2>Frank Lloyd
Wright</group2></group1>

ims.general.title = <group2>Frank Lloyd Wright</group2>

5: ADDING FRBR GROUP INFORMATION

Rather than mapping between vocabularies, we could try to create FRBR
information in a new vocabulary when we ingest a record. We could use some
algorithms to try to do this automatically, but in a user supervised way. 

6: SYNTHESISE AND ANNOTATE DC INFORMATION 

Here when we ingest records we use some automatic mapping rules to do a
crosswalk to create DC information from the metadata. The results of this
crosswalk are then presented to a user, either one at a time or in a batch
view for them to inspect. They are then given the chance to alter this
information, at which time it is saved to the repository. (However this
breaks due to the examples in the demo script).

Comments, feedback please?

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Tuesday, 16 September 2003 13:22:31 UTC