RE: Investigating the Artstor and VRA schemas from Butler, Mark on 2003-10-23 (www-rdf-dspace@w3.org from October 2003)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Thu, 23 Oct 2003 11:18:21 +0100
To: "'www-rdf-dspace@w3.org'" <www-rdf-dspace@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E8082061F4@0-mail-br1.hpl.hp.com>

Hi Kevin

> If we are going to try to collapse all versions of Person to a single 
> Person reference then presumably there also must be a way to collapse 
> the individual records that come from converting our XML sources to 
> RDF.  Arguably then my XSLT script should first look up a 
> person in the 
> common database before creating a new person so that the 
> Person records 
> aren't massively duplicated.  

There are two levels to this:

1. removing duplicates within the collection.

2. removing duplicates between collections.

In the XSLT script I've created for Artstor, it creates URIs for people from
their Artstor identifier. One of the nice sideffects of using RDF is because
different instances of the same person have the same URI the duplicates of
type (1) are removed if we de-serialize and serialize the model e.g. convert
it to N3.

However removing duplicates of type (2) is more complicated, because
generally the way we construct the unique URIs will vary between
collections. 

> Anyone have recommendations on 
> how to set 
> up a global table of Person records to reference in XSLT?  Or 
> perhaps it 
> would be easier to put in a temporary reference using XSLT 
> and replace 
> that with a global reference using a bit of Perl?

I wouldn't attempt to use XSLT. Removing duplicates of type (2) is an
important part of mapping between collections, so I think it needs to be
done in RDF using semantic web tools as that's a key area SIMILE is
investigating? 

kind regards

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Thursday, 23 October 2003 06:19:34 UTC