Mismatch issue

Hi team, 

I'd like to have a go at explaining concisely the "mismatch problem" as I
see it raised by the demo.

(I realise now this is something MacKenzie and Eric have alluded to in the
past, so apologies in advance that it has taken me this long to understand
it, and in case I am grossly misusing the terminology)

One of the problems with dealing with heterogeneous metadata is we have a
lot of metadata about resources e.g. digital images, learning objects etc.
However in order for users to locate the information they need, metadata
about the resources may be insufficient, so we need a richer conceptual
model. Specifically, as FRBR proposes (although I'm grossly simplifying
here), people & organisations that have some relationship with the resource,
and concepts embodied by the resource are also potential first class
citizens of our conceptual model. For example, consider the following user
scenarios:

"I want images connected with Frank Lloyd Wright"
In fact, this query is possible via free text search, even if we only have
the resource metadata. The only problem occurs if the identifier we are
looking for is encoded in different ways e.g. "Wright, F. L.", "Wright,
Frank L.", "Wright, Frank L. (1867-1959)". 

"I want images connected with 20th century architects"
There are a couple of ways of doing this. We could just search the
descriptions and keyword fields of the resources. An alternative is to
search people & organisations, to try to identify individuals who are
classified under 20th century and architect. One of these individuals is
Frank Lloyd Wright, so we then search for resources connected with him,
particularly as we are likely to have a large number of resources as he was
influential. Alternatively, we may look up 20th century and architect,
determine that architect is related to architecture, then search for
resources connected with this.

1. So one question here is some of the metadata we are using is so close to
free text, that parts of it are simply not machine processable. So the only
thing we can do on it free text search. If that's the case, what advantage
are we gaining by our complex metadata tagging schemes? We might want to tag
the information to generate views, e.g. some users don't need technical
metadata, but can we simplify the schema? Do schemas like the LOM schema
with fields like "SemanticDensity" and "IntendedUserRole" really make sense.


Now I think we can relate to the way libraries currently work, right?
Libraries use authority files and thesauri, which in effect can be thought
of as data collections which in effect make people & organisations and
concepts into first class citizens. In particular they provide synonym
information which is important to deal with different identifiers used for
the same entity. 

One of the things the library community has thought about this already e.g.
FRBR, so currently I'm trying to learn a bit more about this. MacKenzie,
Eric, you've mentioned FRBR here - can you give some background here?

In addition I have some questions I'm thinking about:

2. Can we do meaning cross-collection searches just from the resource
metadata, or do we need to make other things first class citizens also?

Then if we do need other things to be first class citizens, 

3. Can we extract that information from the resource metadata or import it
from other sources (e.g. authority files, thesauri)?

4. How do we annotate the resource metadata to make its relationship with
the other first class citizens explicit?

5. In fact the IMS, DC and VRA records do have similarities so they do seem
to fit into the group 1 - 3 classification for FRBR. However if we pick some
resources which are very different (perhaps biomedical images) does it work
for them? To put it another way, are we going to need to introduce other
first class citizen data objects for other vocabularies?

As Kevin S pointed out to me yesterday, systems like NetFlix make people
(e.g. directors, actors) first class citizens in addition to films. 

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Wednesday, 17 September 2003 13:11:48 UTC