- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Wed, 17 Sep 2003 17:49:52 +0100
- To: SIMILE public list <www-rdf-dspace@w3.org>
Hi team, I'd like to have a go at explaining concisely the "mismatch problem" as I see it raised by the demo. (I realise now this is something MacKenzie and Eric have alluded to in the past, so apologies in advance that it has taken me this long to understand it, and in case I am grossly misusing the terminology) One of the problems with dealing with heterogeneous metadata is we have a lot of metadata about resources e.g. digital images, learning objects etc. However in order for users to locate the information they need, metadata about the resources may be insufficient, so we need a richer conceptual model. Specifically, as FRBR proposes (although I'm grossly simplifying here), people & organisations that have some relationship with the resource, and concepts embodied by the resource are also potential first class citizens of our conceptual model. For example, consider the following user scenarios: "I want images connected with Frank Lloyd Wright" In fact, this query is possible via free text search, even if we only have the resource metadata. The only problem occurs if the identifier we are looking for is encoded in different ways e.g. "Wright, F. L.", "Wright, Frank L.", "Wright, Frank L. (1867-1959)". "I want images connected with 20th century architects" There are a couple of ways of doing this. We could just search the descriptions and keyword fields of the resources. An alternative is to search people & organisations, to try to identify individuals who are classified under 20th century and architect. One of these individuals is Frank Lloyd Wright, so we then search for resources connected with him, particularly as we are likely to have a large number of resources as he was influential. Alternatively, we may look up 20th century and architect, determine that architect is related to architecture, then search for resources connected with this. 1. So one question here is some of the metadata we are using is so close to free text, that parts of it are simply not machine processable. So the only thing we can do on it free text search. If that's the case, what advantage are we gaining by our complex metadata tagging schemes? We might want to tag the information to generate views, e.g. some users don't need technical metadata, but can we simplify the schema? Do schemas like the LOM schema with fields like "SemanticDensity" and "IntendedUserRole" really make sense. Now I think we can relate to the way libraries currently work, right? Libraries use authority files and thesauri, which in effect can be thought of as data collections which in effect make people & organisations and concepts into first class citizens. In particular they provide synonym information which is important to deal with different identifiers used for the same entity. One of the things the library community has thought about this already e.g. FRBR, so currently I'm trying to learn a bit more about this. MacKenzie, Eric, you've mentioned FRBR here - can you give some background here? In addition I have some questions I'm thinking about: 2. Can we do meaning cross-collection searches just from the resource metadata, or do we need to make other things first class citizens also? Then if we do need other things to be first class citizens, 3. Can we extract that information from the resource metadata or import it from other sources (e.g. authority files, thesauri)? 4. How do we annotate the resource metadata to make its relationship with the other first class citizens explicit? 5. In fact the IMS, DC and VRA records do have similarities so they do seem to fit into the group 1 - 3 classification for FRBR. However if we pick some resources which are very different (perhaps biomedical images) does it work for them? To put it another way, are we going to need to introduce other first class citizen data objects for other vocabularies? As Kevin S pointed out to me yesterday, systems like NetFlix make people (e.g. directors, actors) first class citizens in addition to films. Dr Mark H. Butler Research Scientist HP Labs Bristol mark-h_butler@hp.com Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Wednesday, 17 September 2003 13:11:48 UTC