Re: Syntax vs Semantics vs XML Schema vs RDF Schema vs QNames vs URIs (was RE: Using urn:publicid: for namespaces) from David Allsopp on 2001-08-16 (www-rdf-interest@w3.org from August 2001)

From: David Allsopp <dallsopp@signal.dera.gov.uk>
Date: Thu, 16 Aug 2001 11:24:14 +0100
To: Patrick.Stickler@nokia.com
CC: www-rdf-interest@w3.org
Message-ID: <3B7B9F4E.21F1E913@signal.dera.gov.uk>
Patrick.Stickler@nokia.com wrote:
> 
> > > rather than a single opaque URI identifier.
> >
> > But this is just querying - you have to do that anyway to
> > find out what
> > the "opaque URI" actually is.
> 
> Why would you need to find out what a URI "is". Do you
> mean dereferencing it? Surely dereferencing of URIs is not
> required for any kind of RDF based inferencing.

Ok, I'm confused as to what you meant - you appeared to be saying that
it was difficult to refer to the anonymous resource because you would
have to use its 'surrounding' related nodes to identify it.  My point
was that even if the node does have an opaque URL, if the data are at a
remote site or agent you have no idea what that URL string is, and have
to form a query in order to find out.  This query would of course use
the related nodes, so the situation is no different.

If on the other hand you have accessed and parsed that RDF locally, you
will have generated a local ID for that resource and can refer to it
using that ID.

> Even if some application may wish to dereference a URI for
> some purpose, that URI is not a "URI" per se to RDF, it is
> simply an opaque universal identifier, no?

Yes; I wasn't suggesting dereferencing.

> > John --hasFather--> [] --age--> 84
> >
> > John --hasFather--> [] --age--> 84
> >
> > compared with
> >
> > John --hasFather--> randomgenid0123456789 --age--> 84
> >
> > John --hasFather--> randomgenid9876543210 --age--> 84
> >
> > where [] represents an anonymous node.
> >
> > The point is that we don't know the name of John's father, so
> > assigning
> > him a random name makes our life harder, not easier, since everybody
> > necessarily assigns him a _different_ random name.
> 
> But this is exactly my point. There is no such thing as an anonymous
> node! It always gets a randomly generated system identifier!

So what? In principle, the system can keep track of which nodes are in
fact anonymous and distinguish them from the others.
If the RDF is then re-serialized, the anonymity of the nodes should then
be preserved, so no system identifiers are exported, and the recipient
understands that these are anonymous nodes.  I just don't see the
problem.

Let's say that I implement a system where the anonymous resource is NOT
given a system name in the form of a URI string, but is only stored as a
distinct object in memory.  Would that be any different?  Or we could
randomly change the name of all anonymous nodes every second; it
wouldn't make any difference.

[Aside: perhaps this is rather like the Robinson Crusoe story, where he
meets a foreigner on his island; not knowing his name, and having met
him on a Friday, he calls him "Man Friday". The man presumably has a
real name, but we don't know it - we have to call him _something_, but
we acknowledge it isn't really his name.]

> So if I get the same statement twice (e.g. it happens to be defined
> redundantly in two disparate sources) then a given system will
> assign *different* system identities to each anonymous node
> for each essentially equivalent statement.

Not necessarily - we have the option of keeping track of which resources
are anonymous and handling them specially if we want.

> Would it not be far better to have a "variable" for an anonymous
> node which is based on the fusion of the subject and predicate
> identities. Thus rather than the current practice where
> 
>  John --hasFather--> [] --age--> 84
>  John --hasFather--> [] --age--> 84
> 
> results in
> 
>  [John, hasFather, gen123]
>  [gen123, age, 84]
>  [John, hasFather, gen456]
>  [gen456, age, 84]

This is what tends to happen, but we can in principle detect the
anonymous nodes and do more intelligent merging.

> which is *not* what was intended; we instead could get
> 
>  [John, hasFather, rdf:anonymous:(John)(hasFather)]
>  [rdf:anonymous:(John)(hasFather), age, 84]
> 
> with neither redundancy nor irreconcilable equivalence, and
> where the implicit but regular (not system dependent) identity of
> an anonymous node is defined in terms of a special RDF specific
> URI scheme and sub-type for anonymous nodes.

I have no objection to explicit identification of anonymous nodes, but I
don't think your suggested scheme solves the problem yet (nice idea
though...):

John --hasFather--
                  |
                  [] --age--> 84
                  |
Jim --hasBrother--

What's the URI of the anonymous node here? If I add more triples
pointing to it, then what?

[Actually there may be a wider problem here as I don't think that graph
can be serialized in XML RDF with an anonymous node 8-) So some explicit
identification scheme may be needed...]

[neat encoding of statements]

> Thus, the issue is not really so much about anonymous nodes but
> that they are in fact *not* anonymous within a given system, being
> given unique and disjunct identities -- nor are they really anonymous
> in the conceptual graph, as they represent a single actual resource
> having an implicit identity based on their context within a statement
> (which all nodes have, even if given an explicit URI identity).

They are anonymous in the syntax, and have a temporary name in
implementations (although one could probably come up with an
implementation where they were treated specially and so only really had
a memory address or something).

Does something have to have a name in order to be distinct? I don't see
that it does - as we said before, it can be identified by its
surroundings.  Generating a name such as "thingNextToFoo" is just a
convenience for this identification.
I do belive that 'anonymous' nodes are different to others in that the
name is _only_ a convenience, and could be changed at random without
affecting anything (in principle - provided the change is distributed
appropriately!).

I guess the difference is that the name can be removed from any given
graph WITHOUT LOSS OF INFORMATION, (only loss of convenience). Removing
any other name changes the graph, by removing information.

> > I don't see how removing anonymous nodes assists here - the data can
> > always be structured in different ways, and you have to know that in
> > advance, or perform cleverness to deduce the structure.
> 
> In this particular case, which is essentially talking about removing
> collections as distinct structures within the graph, it greatly simplifies
> processing, since the set of values for a given query will be a flat/shallow
> list of URIs, not a possible list of mixed URIs and anonymous nodes.

OK.

Regards,

David.

-- 
/d{def}def/u{dup}d[0 -185 u 0 300 u]concat/q 5e-3 d/m{mul}d/z{A u m B u
m}d/r{rlineto}d/X -2 q 1{d/Y -2 q 2{d/A 0 d/B 0 d 64 -1 1{/f exch d/B
A/A z sub X add d B 2 m m Y add d z add 4 gt{exit}if/f 64 d}for f 64 div
setgray X Y moveto 0 q neg u 0 0 q u 0 r r r r fill/Y}for/X}for showpage
Received on Thursday, 16 August 2001 06:24:26 UTC