- From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
- Date: Tue, 14 Oct 2003 11:14:51 +0100
- To: "'Kevin Smathers'" <kevin.smathers@hp.com>
- Cc: "'www-rdf-dspace@w3.org'" <www-rdf-dspace@w3.org>
Kevin wrote: > objectA > [ > rdf:type typeA > b:propertyB "Yin" > c:propertyC "valuec" > ] > > objectB > [ > rdf:type typeB > b:propertyB "Yang" > d:propertyD "valued" > ] > > objectC > [ > rdf:type someEquivalenceType > equivalent <objectA> > equivalent <objectB> > ] But this is a bit different. It is talking about the metadata records, not the thing being modelled because if <a> is equivalent to <b> the statement about <a> are true about <b> <a> <prop> "v" . <a> owl:sameAs <b> => <b> <prop> "v" . In saying that the concept "Leonardo da Vinci" is the same in two corpuses we are saying that statements in one are true in the other. This is both desirable and a real nuisance. Sometimes you want provenance, sometime not. It is where the logic nature of RDF means we can not think of it as a datastructure with local meaning only. This is one uses of quads for data management and one-level provenance tracking. There are several different ways to use the fourth slot, not all the same either. In the specific demo scenario we are focusing on, then if we have (an image of) a work of art by "Leonardo da Vinci" and a biograph about him, we do want to say they are the same person. That is increasing the value of the information by making such relationship explicit. Recording all the equivalences means that at a single point on the web, someone (something) is recording all these mappings. But is A = B, B = C then by stating this through a concept it merges that A = C. If that concept is what your objectC then we are doing the same thing and a consequence is objectA b:propertyB "Yang" because objectA equivalent objectB means it is the same thing. If that objectC is recording lists of pairwise mappings, this does not happen without A and C being brought together somewhere. In a global federated system, this isn't practical. Andy -----Original Message----- From: Kevin Smathers [mailto:kevin.smathers@hp.com] Sent: 13 October 2003 18:36 To: Butler, Mark Cc: 'www-rdf-dspace@w3.org' Subject: Re: Overview of design decisions made creating stylesheet and sch ema for Artstor data Butler, Mark wrote: >Hi Kevin, > > > >>>Kevin writes in regard to Andy's suggestion >>> >>> > > > >>Creating an class for Person is fine, but combining multiple schemas >>into the same Person object I think is an error. >> >> > >then later you say > > > >>in other words, instead of extending Person with the >>contents >>of each new corpus, each new corpus can maintain its own >>Person class, >>each with its own meaning, >> >> > >I don't think this is a problem, because RDF supports multiple inheritance, >so each new corpus can still maintain its own Person class. We have a single >URI, that represents the concept of Leonardo da Vinci, and this can be an >instance of several different classes concurrently, with the properties >necessary to be members of each class. The important point is identifying >these instances apply to the same individual, and indicating that via the >URI. This is what your SoundExSimilarPerson and GettyULANPerson classes are >doing, right? We also get deconflict automatically as the properties are in >different namespaces. > >To put it another way, > >objectA >[ >rdf:type typeB >rdf:type typeC >b:propertyD "value1" >c:propertyE "value2" >] > >is equivalent to > >objectD >[ >rdf:type typeB >b:propertyD "value1" >b:sameAs objectE >] > >objectE >[ >rdf:type typeC >c:propertyE "value2" >c:sameAs objectD >] > >right? > I agree that the two cases that you show are of equivalently expressive, but I wasn't talking about multiple classification. In cases where objectA and objectB are independently developed, the semantic value of some propertyB is likely to vary even when referring to the same property. Andy proposes moving the discordant element into an new property that is a schema-specific identifier, but the way I would model it is that the instances remain seperate, in other words: objectA [ rdf:type typeA b:propertyB "Yin" c:propertyC "valuec" ] objectB [ rdf:type typeB b:propertyB "Yang" d:propertyD "valued" ] objectC [ rdf:type someEquivalenceType :equivalent <objectA> :equivalent <objectB> ] In your example objectA is inextricably both typeB and typeC. Thus in your example instances of typeB can be equivalent to instances of typeC for only one sense of equivalence -- there can't be any conflicts (one references Getty, another references some homebrew canonical transformation), nor can objectA take one equivalence with different objects depending on the context of the equivalence. > > > >>Rather than replace the original meaning, what you >>need is to apply an adaptor pattern to adjust the meaning to a new >>context; >> >> > >By adaptor pattern, do you invisage an ontology (OWL) or RDFS document, or >do you mean a programmatic description? > Here I'm trying to develop a theory for handling opposing theories of classification. Again, Andy's approach, if I understand correctly, is to rationalize the opposing views -- that is to choose a dominant view, and relegate sub-dominant views to historical references. By using an adaptor pattern what I propose is that each data source should be able to maintain its own dominant view, with adaptive extensions to allow it to be queried in the opposing domain. In other words, a library that, for example, indexes its collections in Library of Congress should continue to see the Library of Congress identifier as the primary identifier of its records, but those records could be mapped for use interlibrary to a library that indexes using Dewey Decimal identifiers by an adaptive wrapper around the original instance. The adaptive wrapper adds flexibility in the mapping and can conceivably be instantiated differently for each peer that would like to see Dewey Decimal numbers. (Feel free to replace LOC, or Dewey with e.g URL's, ISBN, or UPC numbers.) > >One reason we might want to use an adaptor pattern is it allows us to >normalize the data. We are used to the idea of normalizing data in >relational databases, but the idea is also applicable to XML - see [1] and >[2] - and I hypothesise RDF. It seems counterintuitive to talk about >normalization in RDF, because if we pick our first class entities correctly, >we get normalization for free, but I guess by thinking about (RDF) models >from a normalization perspective we can check how well designed a model is. > > I'm not sure that there is any 'correct' set of first class entities that can be determined a-priori. Philosophically this is is a question of episteme; the root assumptions provide the context within which to select the first class entities, but those first class entities will of necessity be different from the classes chosen by people operating in a distinct paradigm. Certain epistemological systems have shown great durability in the face of change, but specialized contexts will always require specialized classification which can be of value to the users of that system even when its classifications seem absurd or nonsensical in the context of one of the common durable systems. >When we map between corpori, and come up with representations of individuals >that combine multiple vocabularies similar to those above, we can consider >normalization also. Clearly an instance having multiple properties, >associated with different namespaces, that contains duplicates of the same >value is a bad idea. Where there is a consistent duplication, we could omit >properties and use inference and subproperty relations instead. > >However compound relations are more complicated e.g. in Andy's example there >is a relation between artstorID and familyName, givenName, dateOfBirth, >dateOfDeath. In the subsequent discussion, let's call the latter the >galleryCard representation (because its similar to vCard but we have DOB/DOD >also). The relationship between artstorID and the galleryCard representation >is more complicated one way than the other: to go from artstorID to >galleryCard we have to do some kind of tokenization, which is potentially >unreliable. However to move from the galleryCard to artstorID is easier >because we just aggregate. > >Therefore to perform normalization, it seems attractive to take artstorID at >ingest, break it in to galleryCard, and then implement some kind of viewer >to aggregate back to the artstorID representation. We can represent both >relations between the galleryCard properties and artstorID programmatically, >but I don't think we can indicate such relations using languages like OWL - >perhaps an OWL expert can correct me here if I'm wrong? > >However I think there is another design principle here that overrides the >need for normalization. Historians talk about primary and secondary sources, >so the problem with using the split at ingest / reaggregate is we have >thrown away a primary source and are rebuilding it using a secondary source. >Despite the need for normalization, this seems a bad idea. So I think it is >okay to split to galleryCard at ingest, but I'm keen for us to keep the >original "Leonardo,da Vinci,1452-1519" as well. > It is sometimes very difficult to talk about this without sounding absurd, but consider the following if you can. Suppose there is a school of the occult that teaches that every soul goes through multiple incarnations, and just for the sake of argument, let's suppose that they had through some divine means determined that J.S. Bach, and Elvis happened to be the same person (qua soul). So they diligently enter that 'fact' into their database. While that representation undoubtably might have value to the school of the occult, it is unlikely that most other schools would have any use for that information. Clearly, even though the epistemological systems interact, they must not inadvertently pollute the other systems. The decision of the occult school to join together those records should be available but ignored unless you are working in the context of the occult. My argument is that things like this occur to a lesser degree all the time. Equivalence shouldn't be expressed by multiple classification because it is too final; rather equivalence should be expressed by indexing where the index can be maintained by the organizations that are interested. > >[1] Normalizing XML, part 1, Will Provost, XML.com >http://www.xml.com/pub/a/2002/11/13/normalizing.html > >[2] Normalizing XML, part 2, Will Provost, XML.com, >http://www.xml.com/pub/a/2002/12/04/normalizing.html > >Dr Mark H. Butler >Research Scientist HP Labs Bristol >mark-h_butler@hp.com >Internet: http://www-uk.hpl.hp.com/people/marbut/ > > -- ======================================================== Kevin Smathers kevin.smathers@hp.com Hewlett-Packard kevin@ank.com Palo Alto Research Lab 1501 Page Mill Rd. 650-857-4477 work M/S 1135 650-852-8186 fax Palo Alto, CA 94304 510-247-1031 home ======================================================== use "Standard::Disclaimer"; carp("This message was printed on 100% recycled bits.");
Received on Tuesday, 14 October 2003 06:17:53 UTC