RE: Mapping between schemas from Butler, Mark on 2003-10-08 (www-rdf-dspace@w3.org from October 2003)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Wed, 8 Oct 2003 15:12:42 +0100
To: www-rdf-dspace@w3.org
Message-ID: <E864E95CB35C1C46B72FEA0626A2E8082061A8@0-mail-br1.hpl.hp.com>
Hi team,

Thanks for all your feedback, hope you don't mind if I reply in one go.

First of all Kevin expressed surprise that I was translating the Artstor XML
to RDF, rather than translating it to the VRA schema I had created before. I
was assuming the following approach: create an "artstor namespace". Style
the ArtStor XML to Artstor RDF using the artstor namespace. Then it should
(in theory) be possible to map to the "vra namespace" I created previously
using RDFS or OWL.

However, by accident this exercise turned out to be pretty interesting,
because the way I interpreted VRA in the "vra namespace" was quite different
to the interpretation that came from the "artstor namespace". This is
because I made modelling errors in my "vra namespace" schema, as Mick and
Dave point out.  

However, I think it is important to remember that schemas are social /
political things, not just technical things. If you don't own the schema, it
doesn't matter how many errors you can spot, it may be impossible to get
them fixed. It the owner is using a schema in a different way, they may
non't encounter problems resulting from those errors. So there may be
situations where a schema is wrong, but we still need to map to it somehow. 

I've been reading the literature on mapping between thesauri. Martin Doerr's
paper [1] points out in thesauri difficulties in mapping stem from having
ground terms that do not correspond, and having different relationships
between those ground terms. In schemas/ontologies, first ideally we need
correspondance between properties and property values, but second a property
might have a literal as a range in one schema but an object as a range in
another. If properties don't correspond, we just omit them from the map (or
the "crosswalk") but if our object structure is different, then this is a
much more complicated problem.  

Kevin also commented about the use of URL's in the RDF. Here it's very
important to remember that values assigned to RDF resources are URIs, not
URLs, so they are just unique identifiers for objects, and there is no
guarantee that they correspond directly to a retrievable resource, although
RDF has had some criticism for taking this approach. 

However the point Kevin makes about IP addresses for URL and ServerURL is
valid, as these are obviously URLs due to their names, as is the point about
the use of unqualified hostnames (which I copied from the XML), although I'm
not sure how we can convert the IP addresses back to hostnames.   
 
I'm using the Saxon processor, http://saxon.sourceforge.net/ BTW.

Kevin also makes a good point that there is not necessarily a relationship
between the personalName and corporateName properties, apart from the fact
they may be referring to the same object, as I had assumed that there is a
relationship between them. 

However I will explain my thinking here, apologies in advance if I am
pointing out the blatantly obvious. 

One of the problems with converting XML to RDF is how to deal with nested
elements. This is because there are three different situations for using
nesting in XML:

1. Grouping related properties together e.g. subproperties

<Image>
	<MetaData>
      	<Record_Type>work</Record_Type>
      	<Subject>Portraits</Subject>
      	<Description>Copy by Raphael</Description>
	</MetaData>
</Image>

because the properties record, subject and description are all metadata e.g.
they are subproperties of metadata. So in RDF this can be done by making
record_type, subject and description subproperties of metadata. 

2. Creating related objects

<Image>
	<MediaFile>
		<resolution>0</resolution>
	      <filename>41822001474533.jpg</filename>
	</MediaFile>
<Image>

because there is a relationship between the Image object and a MediaFile
object, and this may potentially be a one to many relationship. So in RDF we
need to make two objects, an Image object and a Mediafile object, and create
a relationship between them.

3. Distinguishing context

<Image>
	<Source>Arstor team</Source>
	<MetaData>
		<Description>Copy by Raphel</Description>
		<Source>British Museum</Source>
	</MetaData>
</Image>

the nesting is to indicate that the Source "British Museum" refers to the
source of the descriptive metadata, whereas the Source "Artstor team" refers
to the Source of this record. 

So how do we represent this in RDF? Well there are two ways: we can either
create objects to distinguish between our contexts, e.g. Image refers to two
objects, TechnicalMetadata and DescriptiveMetadata. Alternatively we can
create two properties, technicalMetadataSource and descriptiveMetadataSource
which are both subproperties of source. We can only use the second method is
preferable when we have a fixed number of contexts that we have identified
in advance, otherwise we have to use the first method. Querying with the
first method is slightly more complicated, as we are not just interested in
the value of source, we are also interested in its relationship with the
Image object so we can determine it's context. 

For example, creation_date and update_date are examples of the second
approach. However if we have a large number of events with dates that we
wished to associate with the Image, we might be better creating two events,
one of type creation, the other of type update, that both had date
properties. 
 
Dave notes that for creator, approach b is preferable to approach a, but
proposes a new approach using a custom schema property to represent the
relationship between creator and corporate name. It would help if we had a
definition of qualifier, although from Andy's examples it seems to be a
superproperty of subProperty in that sometimes it means subProperty, but
there are also examples where this interpretation doesn't work. 

The project looking at thesauri in RDF also takes the custom schema property
approach, although its not clear to me whether we can define custom schema
properties using RDFS or OWL. If not, we are back to defining custom
processors, right?

[1] "Semantic Problems of Thesaurus Mapping." Martin Doerr.
http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Wednesday, 8 October 2003 10:19:45 UTC