- From: Danny Ayers <danny.ayers@gmail.com>
- Date: Sun, 22 Jan 2006 13:04:20 +0100
- To: David Pratt <fairwinds@eastlink.ca>
- Cc: semantic-web@w3.org
On 1/22/06, David Pratt <fairwinds@eastlink.ca> wrote: > Let's say I have a schema for automobiles where there are a number of > properties that describe each car including model, manufacuter etc. I > also have a schema for car parts also. Ok, I'll have a quick crack going down into a bit of detail (probably errors here but they usually get caught quickly on this list ;-) Seems to me there are two strategies that stand out, direct and indirect mapping between the different data schemes. I don't think this depends too much on the particular modelling language, RDBMS schema or UML might also do the same job (though not really XML Schema, as that tells you about the syntax, not the things the language describes). Anyhow, let's say two companies have data about a particular model of car. Corgi's id for a Lamborghini Marzal is "clm" and Matchbox's "mlm". So Corgi's vocabulary might be translatable to something like this: [ a rdfs:Class ; c:id "clm" ; c:name "Lamborghini Marzal" . ] and Matchbox's: [ a rdfs:Class ; m:id "mlm" ; m:manufacturer "Lamborghini ; m:model "Marzal" . ] (Apologies if my n3/Turtle is out, but hopefully it's clear what's intended - not sure about bnodes for classes either, assume an arbitrary URI if necessary) Now say a particular car known to Corgi is a Lamborghini Marzal, for simplicities sake let's give it a URI: <http://example.org/this-car> c:id "clm" . A direct mapping here would be to say that the two classes of car are somehow the same. Within this there are still a few options. There's owl:sameAs, which is generally used to say two individuals are the same, but we're talking classes here (it could be used this way in OWL Full, but it doesn't really capture the right idea). If one class were more general than the other (e.g. Matchbox's was only by manufacturer) then one might say: [ a rdfs:Class ; c:id "clm" ; c:name "Lamborghini Marzal" . ] rdfs:subClassOf [ a rdfs:Class ; m:id "mlm" ; m:manufacturer "Lamborghini . ] (It would be likely that both c:id and m:id were inverse functional properties, i.e. that they did disambiguate the class identification). This would allow the inference that if a particular car was a Lamborghini Marzal according to Corgi, it would also be a Lamborghini according to Matchbox. But in this particular case a car that is a Lamborghini Marzal according to Corgi will also be one according to Matchbox. So one is a subClassOf the other and vice versa. A pair of rdfs:subClassOf statements would say this, though there's shorthand in the form of owl:equivalentClass. So that's what I'll call direct mapping. Indirect mapping would be through other terms/vocabularies, and one general solution would be to use a terms which subsume those that need mapping. So - :LM a rdfs:Class; rdfs:subClassOf [ a rdfs:Class ; c:id "clm" . ]; rdfs:subClassOf [ a rdfs:Class ; m:id "mlm" ; m:manufacturer "Lamborghini ; m:model "Marzal" . ] . In some respects this may be an easier approach than direct mapping - there's a bit less commitment involved in adding another subclass when another car manufacturer comes along. But there is a cost to the reduced commitment, in that <http://example.org/this-car> c:id "clm" . would give us <http://example.org/this-car> rdf:type :LM . but wouldn't give <http://example.org/this-car> c:id "mlm" . Of course it might still be possible to make the subClass relationship symmetrical again, i.e. make :LM owl:equivalentClass to the other classes. On top of the simple equivalence mapping there could be constraints etc. that may or may not be expressable in RDF/OWL. I think it's probably true that mapping between entities is usually easier than mapping between relationships, seems like there's more to vary e.g. <http://example.org/this-car> c:color <http://corgi.com/colors#Green> . <http://example.org/this-car> m:color "#00FF00" . <http://example.org/this-car> x:colour "Green" . etc. > Given this scenario, what would be the best approach for consolidating > the information as much as possible. I'm not sure there is enough information here to determine "best", in fact it might not be possible to tell until you've done a full implementation of all the promising approaches and evaluated. Even then there may not be a clear best ;-) I would appreciate comments on how > one might accomplish this in a way that may not produce much unnecessary > duplication in the data store. That could make a significant difference, although I would imagine it would depend a lot on the particular store implementation. My guess is to optimise on that aspect you'd need complete RDF/OWL reasoning plus any extra rules to infer syntax and other 'hidden' mappings, like: _:x m:color "#00FF00" . => _:x x:colour "Green" . => _:x x:color "green" . Once you'd got all that, it should be possible in principle to keep the graph lean (not sure what there is implementation-wise, Reto's got a leanifier at http://gmuer.ch/2005/11/24/making-graphs-lean). Cheers, Danny. -- http://dannyayers.com
Received on Sunday, 22 January 2006 12:04:33 UTC