Re: Syntax vs Semantics vs XML Schema vs RDF Schema vs QNames vs URIs (was RE: Using urn:publicid: for namespaces) from by way of on 2001-08-21 (www-rdf-interest@w3.org from August 2001)

From: by way of <dallsopp@signal.qinetiq.com>
Date: Mon, 20 Aug 2001 23:12:08 -0400
To: www-rdf-interest@w3.org
Message-Id: <200108211205.IAA32145@tux.w3.org>
[freed from spam trap -rrs]

 Message-ID: <3B80D0AF.9D47899F@signal.qinetiq.com>
 Date: Mon, 20 Aug 2001 04:57:05 -0400 (EDT)
 From: David Allsopp <dallsopp@signal.qinetiq.com>
 CC: www-rdf-logic@w3.org, www-rdf-interest@w3.org
References: <2BF0AD29BC31FE46B78877321144043114B4F0@trebe003.NOE.Nokia.com>
<3B7A90E9.35F0EDAF@signal.dera.gov.uk>
<008f01c127a5$c5a81a60$7cac1218@reston1.va.home.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

"Thomas B. Passin" wrote:
> 
> [David Allsopp]
> >
> > Finally, this hinders you in identifying equivalent resources when you
> > merge data from two sources, because the two sources have to assign a
> > pseudo-unique URI to the resource; a third party can't then determine
> > that the nodes really refer to the same thing, e.g:
> >
> > John --hasFather--> [] --age--> 84
> >
> > John --hasFather--> [] --age--> 84
> >
> > compared with
> >
> > John --hasFather--> randomgenid0123456789 --age--> 84
> >
> > John --hasFather--> randomgenid9876543210 --age--> 84
> >
> > where [] represents an anonymous node.
> >
> > The point is that we don't know the name of John's father, so assigning
> > him a random name makes our life harder, not easier, since everybody
> > necessarily assigns him a _different_ random name.
> 
> This apparently simple example actually illustrates how hard it can be to
> merge data from different sources.  For this example to work the way David
> seems to want, the system that performs that merging of data has to know
> something more about the "hasFather" property.  

Absolutely - I didn't want to complicate the example by mentioning it
though 8-)
We would need cardinality constraints (e.g. from DAML).

> In the same way, in David's original version, it could be that John has two
> fathers of the same age, and each data source happens to know about one of
> them. The only way we could resolve this question is to know that there is a
> constraint such that a person can only have one father.  But if we know
> that, then David's second (labeled) form can also be resolved.

No, I don't believe so; not without even more information.  In the
labelled case we don't know whether to conclude that the two random IDs
are actually equivalent, or whether the data are just inconsistent/wrong
(always a possibility). Nor can we tell that the two IDs are in fact
just random stuff rather than real names. The triples give the
impression that we know the name of something when this is not the case.

> Now if John did indeed have two fathers after all, the situation would be
> more complex, because on merging the two data sources, our processor would
> have to convert the hasFather statements into some kind of a container,

Why? Why not just have two hasFather properties?

> which itself would apparently be an anonymous node.  It would be something
> like this (with apologies for the undefined terms):
> 
> subj                               property                object
> John                               hasFathers            []
> []                                   typeOf                container
> []                                   RDF_1
> randomgenid0123456789
> randomgenid0123456789  typeOf                malePerson
> []                                   RDF_2
> randomgenid9876543210
> randomgenid9876543210 typeOf                malePerson
> randomgenid0123456789   age                   84
> randomgenid9876543210   age                   86
> 
> Without the labels randomgenid0123456789, randomgenid9876543210 it would be
> much harder to make this set of assertions, if it were possible at all.

Locally, you can have whatever labels you like; I'm only arguing against
keeping those labels when the data are sent to someone else.

> Certainly we couldn't enter this data into a relational database, or write
> it down on paper in a table like this, since we would have to use three
> different "[]" symbols that looked the same but refered to three different
> resources.  To be consistent, of course, I should have used a generated id
> instead of [] for the container as well.
> 
> Indeed, if you think of a set of triples as being rows in a relational
> database table, then ask what would be the primary key of that table?  The
> only sensible answer is that the primary key must be the combination of the
> subject and predicate.  

This is surely ambiguous if an anonymous node is the object of two
statements?

> The "semantics" could also be considered to be a
> kind of "business rule", to use an expression from a different domain.

> Taking this relational database viewpoint, each node must necessarily have a
> value (or label), but it may be that a particular implementation could hide
> the label, or exclude it from serialization.

I think we agree there - I'm not suggesting that nodes should be kept
label-free in implementations.

Regards,

David Allsopp.

-- 
/d{def}def/u{dup}d[0 -185 u 0 300 u]concat/q 5e-3 d/m{mul}d/z{A u m B u
m}d/r{rlineto}d/X -2 q 1{d/Y -2 q 2{d/A 0 d/B 0 d 64 -1 1{/f exch d/B
A/A z sub X add d B 2 m m Y add d z add 4 gt{exit}if/f 64 d}for f 64 div
setgray X Y moveto 0 q neg u 0 0 q u 0 r r r r fill/Y}for/X}for showpage
Received on Tuesday, 21 August 2001 08:05:03 UTC