- From: by way of <dallsopp@signal.qinetiq.com>
- Date: Mon, 20 Aug 2001 23:12:08 -0400
- To: www-rdf-interest@w3.org
[freed from spam trap -rrs] Message-ID: <3B80D0AF.9D47899F@signal.qinetiq.com> Date: Mon, 20 Aug 2001 04:57:05 -0400 (EDT) From: David Allsopp <dallsopp@signal.qinetiq.com> CC: www-rdf-logic@w3.org, www-rdf-interest@w3.org References: <2BF0AD29BC31FE46B78877321144043114B4F0@trebe003.NOE.Nokia.com> <3B7A90E9.35F0EDAF@signal.dera.gov.uk> <008f01c127a5$c5a81a60$7cac1218@reston1.va.home.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit "Thomas B. Passin" wrote: > > [David Allsopp] > > > > Finally, this hinders you in identifying equivalent resources when you > > merge data from two sources, because the two sources have to assign a > > pseudo-unique URI to the resource; a third party can't then determine > > that the nodes really refer to the same thing, e.g: > > > > John --hasFather--> [] --age--> 84 > > > > John --hasFather--> [] --age--> 84 > > > > compared with > > > > John --hasFather--> randomgenid0123456789 --age--> 84 > > > > John --hasFather--> randomgenid9876543210 --age--> 84 > > > > where [] represents an anonymous node. > > > > The point is that we don't know the name of John's father, so assigning > > him a random name makes our life harder, not easier, since everybody > > necessarily assigns him a _different_ random name. > > This apparently simple example actually illustrates how hard it can be to > merge data from different sources. For this example to work the way David > seems to want, the system that performs that merging of data has to know > something more about the "hasFather" property. Absolutely - I didn't want to complicate the example by mentioning it though 8-) We would need cardinality constraints (e.g. from DAML). > In the same way, in David's original version, it could be that John has two > fathers of the same age, and each data source happens to know about one of > them. The only way we could resolve this question is to know that there is a > constraint such that a person can only have one father. But if we know > that, then David's second (labeled) form can also be resolved. No, I don't believe so; not without even more information. In the labelled case we don't know whether to conclude that the two random IDs are actually equivalent, or whether the data are just inconsistent/wrong (always a possibility). Nor can we tell that the two IDs are in fact just random stuff rather than real names. The triples give the impression that we know the name of something when this is not the case. > Now if John did indeed have two fathers after all, the situation would be > more complex, because on merging the two data sources, our processor would > have to convert the hasFather statements into some kind of a container, Why? Why not just have two hasFather properties? > which itself would apparently be an anonymous node. It would be something > like this (with apologies for the undefined terms): > > subj property object > John hasFathers [] > [] typeOf container > [] RDF_1 > randomgenid0123456789 > randomgenid0123456789 typeOf malePerson > [] RDF_2 > randomgenid9876543210 > randomgenid9876543210 typeOf malePerson > randomgenid0123456789 age 84 > randomgenid9876543210 age 86 > > Without the labels randomgenid0123456789, randomgenid9876543210 it would be > much harder to make this set of assertions, if it were possible at all. Locally, you can have whatever labels you like; I'm only arguing against keeping those labels when the data are sent to someone else. > Certainly we couldn't enter this data into a relational database, or write > it down on paper in a table like this, since we would have to use three > different "[]" symbols that looked the same but refered to three different > resources. To be consistent, of course, I should have used a generated id > instead of [] for the container as well. > > Indeed, if you think of a set of triples as being rows in a relational > database table, then ask what would be the primary key of that table? The > only sensible answer is that the primary key must be the combination of the > subject and predicate. This is surely ambiguous if an anonymous node is the object of two statements? > The "semantics" could also be considered to be a > kind of "business rule", to use an expression from a different domain. > Taking this relational database viewpoint, each node must necessarily have a > value (or label), but it may be that a particular implementation could hide > the label, or exclude it from serialization. I think we agree there - I'm not suggesting that nodes should be kept label-free in implementations. Regards, David Allsopp. -- /d{def}def/u{dup}d[0 -185 u 0 300 u]concat/q 5e-3 d/m{mul}d/z{A u m B u m}d/r{rlineto}d/X -2 q 1{d/Y -2 q 2{d/A 0 d/B 0 d 64 -1 1{/f exch d/B A/A z sub X add d B 2 m m Y add d z add 4 gt{exit}if/f 64 d}for f 64 div setgray X Y moveto 0 q neg u 0 0 q u 0 r r r r fill/Y}for/X}for showpage
Received on Tuesday, 21 August 2001 08:05:03 UTC