- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Wed, 8 Oct 2003 15:12:42 +0100
- To: www-rdf-dspace@w3.org
Hi team, Thanks for all your feedback, hope you don't mind if I reply in one go. First of all Kevin expressed surprise that I was translating the Artstor XML to RDF, rather than translating it to the VRA schema I had created before. I was assuming the following approach: create an "artstor namespace". Style the ArtStor XML to Artstor RDF using the artstor namespace. Then it should (in theory) be possible to map to the "vra namespace" I created previously using RDFS or OWL. However, by accident this exercise turned out to be pretty interesting, because the way I interpreted VRA in the "vra namespace" was quite different to the interpretation that came from the "artstor namespace". This is because I made modelling errors in my "vra namespace" schema, as Mick and Dave point out. However, I think it is important to remember that schemas are social / political things, not just technical things. If you don't own the schema, it doesn't matter how many errors you can spot, it may be impossible to get them fixed. It the owner is using a schema in a different way, they may non't encounter problems resulting from those errors. So there may be situations where a schema is wrong, but we still need to map to it somehow. I've been reading the literature on mapping between thesauri. Martin Doerr's paper [1] points out in thesauri difficulties in mapping stem from having ground terms that do not correspond, and having different relationships between those ground terms. In schemas/ontologies, first ideally we need correspondance between properties and property values, but second a property might have a literal as a range in one schema but an object as a range in another. If properties don't correspond, we just omit them from the map (or the "crosswalk") but if our object structure is different, then this is a much more complicated problem. Kevin also commented about the use of URL's in the RDF. Here it's very important to remember that values assigned to RDF resources are URIs, not URLs, so they are just unique identifiers for objects, and there is no guarantee that they correspond directly to a retrievable resource, although RDF has had some criticism for taking this approach. However the point Kevin makes about IP addresses for URL and ServerURL is valid, as these are obviously URLs due to their names, as is the point about the use of unqualified hostnames (which I copied from the XML), although I'm not sure how we can convert the IP addresses back to hostnames. I'm using the Saxon processor, http://saxon.sourceforge.net/ BTW. Kevin also makes a good point that there is not necessarily a relationship between the personalName and corporateName properties, apart from the fact they may be referring to the same object, as I had assumed that there is a relationship between them. However I will explain my thinking here, apologies in advance if I am pointing out the blatantly obvious. One of the problems with converting XML to RDF is how to deal with nested elements. This is because there are three different situations for using nesting in XML: 1. Grouping related properties together e.g. subproperties <Image> <MetaData> <Record_Type>work</Record_Type> <Subject>Portraits</Subject> <Description>Copy by Raphael</Description> </MetaData> </Image> because the properties record, subject and description are all metadata e.g. they are subproperties of metadata. So in RDF this can be done by making record_type, subject and description subproperties of metadata. 2. Creating related objects <Image> <MediaFile> <resolution>0</resolution> <filename>41822001474533.jpg</filename> </MediaFile> <Image> because there is a relationship between the Image object and a MediaFile object, and this may potentially be a one to many relationship. So in RDF we need to make two objects, an Image object and a Mediafile object, and create a relationship between them. 3. Distinguishing context <Image> <Source>Arstor team</Source> <MetaData> <Description>Copy by Raphel</Description> <Source>British Museum</Source> </MetaData> </Image> the nesting is to indicate that the Source "British Museum" refers to the source of the descriptive metadata, whereas the Source "Artstor team" refers to the Source of this record. So how do we represent this in RDF? Well there are two ways: we can either create objects to distinguish between our contexts, e.g. Image refers to two objects, TechnicalMetadata and DescriptiveMetadata. Alternatively we can create two properties, technicalMetadataSource and descriptiveMetadataSource which are both subproperties of source. We can only use the second method is preferable when we have a fixed number of contexts that we have identified in advance, otherwise we have to use the first method. Querying with the first method is slightly more complicated, as we are not just interested in the value of source, we are also interested in its relationship with the Image object so we can determine it's context. For example, creation_date and update_date are examples of the second approach. However if we have a large number of events with dates that we wished to associate with the Image, we might be better creating two events, one of type creation, the other of type update, that both had date properties. Dave notes that for creator, approach b is preferable to approach a, but proposes a new approach using a custom schema property to represent the relationship between creator and corporate name. It would help if we had a definition of qualifier, although from Andy's examples it seems to be a superproperty of subProperty in that sometimes it means subProperty, but there are also examples where this interpretation doesn't work. The project looking at thesauri in RDF also takes the custom schema property approach, although its not clear to me whether we can define custom schema properties using RDFS or OWL. If not, we are back to defining custom processors, right? [1] "Semantic Problems of Thesaurus Mapping." Martin Doerr. http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/ Dr Mark H. Butler Research Scientist HP Labs Bristol mark-h_butler@hp.com Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Wednesday, 8 October 2003 10:19:45 UTC