- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Mon, 13 Oct 2003 12:02:04 +0100
- To: "'www-rdf-dspace@w3.org'" <www-rdf-dspace@w3.org>
Hi Kevin, >> Kevin writes in regard to Andy's suggestion > Creating an class for Person is fine, but combining multiple schemas > into the same Person object I think is an error. then later you say > in other words, instead of extending Person with the > contents > of each new corpus, each new corpus can maintain its own > Person class, > each with its own meaning, I don't think this is a problem, because RDF supports multiple inheritance, so each new corpus can still maintain its own Person class. We have a single URI, that represents the concept of Leonardo da Vinci, and this can be an instance of several different classes concurrently, with the properties necessary to be members of each class. The important point is identifying these instances apply to the same individual, and indicating that via the URI. This is what your SoundExSimilarPerson and GettyULANPerson classes are doing, right? We also get deconflict automatically as the properties are in different namespaces. To put it another way, objectA [ rdf:type typeB rdf:type typeC b:propertyD "value1" c:propertyE "value2" ] is equivalent to objectD [ rdf:type typeB b:propertyD "value1" b:sameAs objectE ] objectE [ rdf:type typeC c:propertyE "value2" c:sameAs objectD ] right? > Rather than replace the original meaning, what you > need is to apply an adaptor pattern to adjust the meaning to a new > context; By adaptor pattern, do you invisage an ontology (OWL) or RDFS document, or do you mean a programmatic description? One reason we might want to use an adaptor pattern is it allows us to normalize the data. We are used to the idea of normalizing data in relational databases, but the idea is also applicable to XML - see [1] and [2] - and I hypothesise RDF. It seems counterintuitive to talk about normalization in RDF, because if we pick our first class entities correctly, we get normalization for free, but I guess by thinking about (RDF) models from a normalization perspective we can check how well designed a model is. When we map between corpori, and come up with representations of individuals that combine multiple vocabularies similar to those above, we can consider normalization also. Clearly an instance having multiple properties, associated with different namespaces, that contains duplicates of the same value is a bad idea. Where there is a consistent duplication, we could omit properties and use inference and subproperty relations instead. However compound relations are more complicated e.g. in Andy's example there is a relation between artstorID and familyName, givenName, dateOfBirth, dateOfDeath. In the subsequent discussion, let's call the latter the galleryCard representation (because its similar to vCard but we have DOB/DOD also). The relationship between artstorID and the galleryCard representation is more complicated one way than the other: to go from artstorID to galleryCard we have to do some kind of tokenization, which is potentially unreliable. However to move from the galleryCard to artstorID is easier because we just aggregate. Therefore to perform normalization, it seems attractive to take artstorID at ingest, break it in to galleryCard, and then implement some kind of viewer to aggregate back to the artstorID representation. We can represent both relations between the galleryCard properties and artstorID programmatically, but I don't think we can indicate such relations using languages like OWL - perhaps an OWL expert can correct me here if I'm wrong? However I think there is another design principle here that overrides the need for normalization. Historians talk about primary and secondary sources, so the problem with using the split at ingest / reaggregate is we have thrown away a primary source and are rebuilding it using a secondary source. Despite the need for normalization, this seems a bad idea. So I think it is okay to split to galleryCard at ingest, but I'm keen for us to keep the original "Leonardo,da Vinci,1452-1519" as well. [1] Normalizing XML, part 1, Will Provost, XML.com http://www.xml.com/pub/a/2002/11/13/normalizing.html [2] Normalizing XML, part 2, Will Provost, XML.com, http://www.xml.com/pub/a/2002/12/04/normalizing.html Dr Mark H. Butler Research Scientist HP Labs Bristol mark-h_butler@hp.com Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Monday, 13 October 2003 07:02:36 UTC