RE: Syntax vs Semantics vs XML Schema vs RDF Schema vs QNames vs URIs (was RE: Using urn:publicid: for namespaces) from Patrick.Stickler@nokia.com on 2001-08-16 (www-rdf-logic@w3.org from August 2001)

From: <Patrick.Stickler@nokia.com>
Date: Thu, 16 Aug 2001 12:24:10 +0300
To: dallsopp@signal.dera.gov.uk, www-rdf-logic@w3.org
Cc: www-rdf-logic@w3.org, www-rdf-interest@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114BF8B@trebe003.NOE.Nokia.com>
> > rather than a single opaque URI identifier.
> 
> But this is just querying - you have to do that anyway to 
> find out what
> the "opaque URI" actually is. 

Why would you need to find out what a URI "is". Do you 
mean dereferencing it? Surely dereferencing of URIs is not
required for any kind of RDF based inferencing.

Even if some application may wish to dereference a URI for
some purpose, that URI is not a "URI" per se to RDF, it is
simply an opaque universal identifier, no?

> John --hasFather--> [] --age--> 84
> 
> John --hasFather--> [] --age--> 84
> 
> compared with
> 
> John --hasFather--> randomgenid0123456789 --age--> 84
> 
> John --hasFather--> randomgenid9876543210 --age--> 84
> 
> where [] represents an anonymous node.
> 
> The point is that we don't know the name of John's father, so 
> assigning
> him a random name makes our life harder, not easier, since everybody
> necessarily assigns him a _different_ random name.

But this is exactly my point. There is no such thing as an anonymous
node! It always gets a randomly generated system identifier!

So if I get the same statement twice (e.g. it happens to be defined
redundantly in two disparate sources) then a given system will
assign *different* system identities to each anonymous node
for each essentially equivalent statement. 

Would it not be far better to have a "variable" for an anonymous
node which is based on the fusion of the subject and predicate
identities. Thus rather than the current practice where

 John --hasFather--> [] --age--> 84
 John --hasFather--> [] --age--> 84

results in 

 [John, hasFather, gen123]
 [gen123, age, 84]
 [John, hasFather, gen456]
 [gen456, age, 84]

which is *not* what was intended; we instead could get

 [John, hasFather, rdf:anonymous:(John)(hasFather)]
 [rdf:anonymous:(John)(hasFather), age, 84]

with neither redundancy nor irreconcilable equivalence, and
where the implicit but regular (not system dependent) identity of
an anonymous node is defined in terms of a special RDF specific
URI scheme and sub-type for anonymous nodes.

The very same approach provides for system-independent and portable
reification of statements based on the statements themselves, without 
the need to assert those statements in a given knowledge base unless
an application specifically chooses to do so. E.g.

  rdf:statement:(subject)(predicate)(object)

  <rdf:Description
about="http://some.org.com/some/url/path/personnel_data.html">
     <foo:asserts resource="rdf:statement:(John)(age)(32)" />
  </rdf:Description>

Thus, the issue is not really so much about anonymous nodes but 
that they are in fact *not* anonymous within a given system, being
given unique and disjunct identities -- nor are they really anonymous
in the conceptual graph, as they represent a single actual resource
having an implicit identity based on their context within a statement 
(which all nodes have, even if given an explicit URI identity).

Interestingly, the same RDF specific URI scheme approach could be used 
for the QName to URI mapping problem, with  rdf:qname:(namespace)(name)

But these are just ideas... (and I'm not sure I fully like them myself ;-)

> > Another is not knowing whether I will get back from a
> > query an anonymous node constituting the root of a collection,
> > containing resource nodes (or other collections) rather than
> > an actual resource node -- or possibly getting a set of results
> > having both resource nodes *and* collection root nodes -- because
> > in one case in the *serialization* the values of a property were
> > defined as a bag in the "same" statement and in another case
> > each was defined as a separate statement! Yuck!
> 
> I don't see how removing anonymous nodes assists here - the data can
> always be structured in different ways, and you have to know that in
> advance, or perform cleverness to deduce the structure.

In this particular case, which is essentially talking about removing
collections as distinct structures within the graph, it greatly simplifies
processing, since the set of values for a given query will be a flat/shallow
list of URIs, not a possible list of mixed URIs and anonymous nodes.

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
Received on Thursday, 16 August 2001 05:24:18 UTC