Re: Mapping between schemas

Hi Mark,

Sorry I ignored the real problem (as described in your reply), I got too 
caught up in whether the example made sense to understand the real 
problem.  In a previous project when I encountered the problem of 
mapping a 1:N relationship to a 1:1 relationship, I did so by mapping 
the first element of the N in the 1:1.  So if ArtStor had only one 
creator, and VRA had multiple creators, then the mapping (in pseudocode) 
would look like this:

VRA to ArtStor:
ArtStor.creator = VRA.creator[0];

ArtStor to VRA:
VRA.creator = new Creator[1];
VRA.creator[0] = ArtStor.creator;

In my previous project this was assumed to be a valid translation since 
the lower expressivity of the ArtStor schema required the limitation, 
and since round-trip conversion wasn't a concern.  If you want to 
maintain the full data set in the limited schema then the only option is 
to fix the schema, or to repeat the records (the same way you would 
unroll a loop in compiler optimization), once for each element that 
needs to be expressed, but this can explode numerically:

VRA to ArtStor:

for i in VRA.creator {
     ArtStor = new ArtStor;
     ArtStor.creator = VRA.creator[i]
}

The numerical explosion comes from trying to flatten several 1:N 
relationships at once.  I wouldn't recommend trying it with more than two.

Cheers,
-kls

Butler, Mark wrote:

>Hi team,
>
>Thanks for all your feedback, hope you don't mind if I reply in one go.
>
>First of all Kevin expressed surprise that I was translating the Artstor XML
>to RDF, rather than translating it to the VRA schema I had created before. I
>was assuming the following approach: create an "artstor namespace". Style
>the ArtStor XML to Artstor RDF using the artstor namespace. Then it should
>(in theory) be possible to map to the "vra namespace" I created previously
>using RDFS or OWL.
>
>However, by accident this exercise turned out to be pretty interesting,
>because the way I interpreted VRA in the "vra namespace" was quite different
>to the interpretation that came from the "artstor namespace". This is
>because I made modelling errors in my "vra namespace" schema, as Mick and
>Dave point out.  
>
>However, I think it is important to remember that schemas are social /
>political things, not just technical things. If you don't own the schema, it
>doesn't matter how many errors you can spot, it may be impossible to get
>them fixed. It the owner is using a schema in a different way, they may
>non't encounter problems resulting from those errors. So there may be
>situations where a schema is wrong, but we still need to map to it somehow. 
>
>I've been reading the literature on mapping between thesauri. Martin Doerr's
>paper [1] points out in thesauri difficulties in mapping stem from having
>ground terms that do not correspond, and having different relationships
>between those ground terms. In schemas/ontologies, first ideally we need
>correspondance between properties and property values, but second a property
>might have a literal as a range in one schema but an object as a range in
>another. If properties don't correspond, we just omit them from the map (or
>the "crosswalk") but if our object structure is different, then this is a
>much more complicated problem.  
>
>Kevin also commented about the use of URL's in the RDF. Here it's very
>important to remember that values assigned to RDF resources are URIs, not
>URLs, so they are just unique identifiers for objects, and there is no
>guarantee that they correspond directly to a retrievable resource, although
>RDF has had some criticism for taking this approach. 
>
>However the point Kevin makes about IP addresses for URL and ServerURL is
>valid, as these are obviously URLs due to their names, as is the point about
>the use of unqualified hostnames (which I copied from the XML), although I'm
>not sure how we can convert the IP addresses back to hostnames.   
> 
>I'm using the Saxon processor, http://saxon.sourceforge.net/ BTW.
>
>Kevin also makes a good point that there is not necessarily a relationship
>between the personalName and corporateName properties, apart from the fact
>they may be referring to the same object, as I had assumed that there is a
>relationship between them. 
>
>However I will explain my thinking here, apologies in advance if I am
>pointing out the blatantly obvious. 
>
>One of the problems with converting XML to RDF is how to deal with nested
>elements. This is because there are three different situations for using
>nesting in XML:
>
>1. Grouping related properties together e.g. subproperties
>
><Image>
>	<MetaData>
>      	<Record_Type>work</Record_Type>
>      	<Subject>Portraits</Subject>
>      	<Description>Copy by Raphael</Description>
>	</MetaData>
></Image>
>
>because the properties record, subject and description are all metadata e.g.
>they are subproperties of metadata. So in RDF this can be done by making
>record_type, subject and description subproperties of metadata. 
>
>2. Creating related objects
>
><Image>
>	<MediaFile>
>		<resolution>0</resolution>
>	      <filename>41822001474533.jpg</filename>
>	</MediaFile>
><Image>
>
>because there is a relationship between the Image object and a MediaFile
>object, and this may potentially be a one to many relationship. So in RDF we
>need to make two objects, an Image object and a Mediafile object, and create
>a relationship between them.
>
>3. Distinguishing context
>
><Image>
>	<Source>Arstor team</Source>
>	<MetaData>
>		<Description>Copy by Raphel</Description>
>		<Source>British Museum</Source>
>	</MetaData>
></Image>
>
>the nesting is to indicate that the Source "British Museum" refers to the
>source of the descriptive metadata, whereas the Source "Artstor team" refers
>to the Source of this record. 
>
>So how do we represent this in RDF? Well there are two ways: we can either
>create objects to distinguish between our contexts, e.g. Image refers to two
>objects, TechnicalMetadata and DescriptiveMetadata. Alternatively we can
>create two properties, technicalMetadataSource and descriptiveMetadataSource
>which are both subproperties of source. We can only use the second method is
>preferable when we have a fixed number of contexts that we have identified
>in advance, otherwise we have to use the first method. Querying with the
>first method is slightly more complicated, as we are not just interested in
>the value of source, we are also interested in its relationship with the
>Image object so we can determine it's context. 
>
>For example, creation_date and update_date are examples of the second
>approach. However if we have a large number of events with dates that we
>wished to associate with the Image, we might be better creating two events,
>one of type creation, the other of type update, that both had date
>properties. 
> 
>Dave notes that for creator, approach b is preferable to approach a, but
>proposes a new approach using a custom schema property to represent the
>relationship between creator and corporate name. It would help if we had a
>definition of qualifier, although from Andy's examples it seems to be a
>superproperty of subProperty in that sometimes it means subProperty, but
>there are also examples where this interpretation doesn't work. 
>
>The project looking at thesauri in RDF also takes the custom schema property
>approach, although its not clear to me whether we can define custom schema
>properties using RDFS or OWL. If not, we are back to defining custom
>processors, right?
>
>[1] "Semantic Problems of Thesaurus Mapping." Martin Doerr.
>http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/
>
>Dr Mark H. Butler
>Research Scientist                HP Labs Bristol
>mark-h_butler@hp.com
>Internet: http://www-uk.hpl.hp.com/people/marbut/
>
>
>  
>


-- 
========================================================
   Kevin Smathers                kevin.smathers@hp.com    
   Hewlett-Packard               kevin@ank.com            
   Palo Alto Research Lab                                 
   1501 Page Mill Rd.            650-857-4477 work        
   M/S 1135                      650-852-8186 fax         
   Palo Alto, CA 94304           510-247-1031 home        
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");

Received on Wednesday, 8 October 2003 11:42:06 UTC