Re: Syntax vs Semantics vs XML Schema vs RDF Schema vs QNames vs URIs (was RE: Using urn:publicid: for namespaces) from David Allsopp on 2001-08-16 (www-rdf-interest@w3.org from August 2001)

From: David Allsopp <dallsopp@signal.dera.gov.uk>
Date: Thu, 16 Aug 2001 14:13:50 +0100
To: Patrick.Stickler@nokia.com
CC: www-rdf-interest@w3.org
Message-ID: <3B7BC70E.3EB5EF41@signal.dera.gov.uk>
Patrick.Stickler@nokia.com wrote:

> > Ok, I'm confused as to what you meant - you appeared to be saying that
> > it was difficult to refer to the anonymous resource because you would
> > have to use its 'surrounding' related nodes to identify it.  My point
> > was that even if the node does have an opaque URL, if the
> > data are at a
> > remote site or agent you have no idea what that URL string
> > is, and have
> > to form a query in order to find out.  This query would of course use
> > the related nodes, so the situation is no different.
> 
> But surely such dereferencing of resource URIs is separate from
> any inferences being made to a given knowledge base. An agent might
> query an RDF knowledge base (agent) to get the URI of a resource from
> which it might be able to retrieve more knowledge, which could
> be syndicated into the RDF knowlege base, but such retrieval and
> syndication of knowledge is outside the scope of basic query
> and inference processes operating on the knowledge base proper.
> 
> Right?

I'm afraid I don't understand your distinction. Querying a remote RDF
model and querying a local RDF model are pretty much the same to me in
principle; I work with agent-based systems that exchange RDF via
messages.

> > > > John --hasFather--> [] --age--> 84
> > > >
> > > > John --hasFather--> [] --age--> 84
> > > >
> > > > compared with
> > > >
> > > > John --hasFather--> randomgenid0123456789 --age--> 84
> > > >
> > > > John --hasFather--> randomgenid9876543210 --age--> 84
> > > >
> > > > where [] represents an anonymous node.
> > > >
> > > > The point is that we don't know the name of John's father, so
> > > > assigning
> > > > him a random name makes our life harder, not easier,
> > since everybody
> > > > necessarily assigns him a _different_ random name.
> > >
> > > But this is exactly my point. There is no such thing as an anonymous
> > > node! It always gets a randomly generated system identifier!
> >
> > So what? In principle, the system can keep track of which nodes are in
> > fact anonymous and distinguish them from the others.
> 
> Eh? Fair enough, in principle, but is that a requirement of the RDF spec?

No; some applications presumably don't need to distinguish them. If you
just want to access data frm a single RDF document, then generating a
locally-unique name for each anonymous resource will work just fine. 
However, if that application exports the data to someone else, then
those ID's had better be either truly unique (probably impossible, but
we can get close) or be indicated as anonymous in some way (either by
being serialised as such using the current M&S, or by some other as-yet
undefined way). Otherwise they could cause problems.

> Can you tell me of any RDF engines that not only differentiate between
> anonymous nodes but also treat disregard their identities during inference?
> (I'd love to know of them, so I can use them)

I'm not sure myself, perhaps others could comment...

> And regardless, if it is not a requirement of the standard, then that
> means that I can't have portable, generic SW agents that can rely on
> such interpretation and treatment of anonymous nodes by any arbitrary
> RDF engine which is compliant with the standard.

Agreed 8-(.  

> > > Would it not be far better to have a "variable" for an anonymous
> > > node which is based on the fusion of the subject and predicate
> > > identities. Thus rather than the current practice where
> > >
> > >  John --hasFather--> [] --age--> 84
> > >  John --hasFather--> [] --age--> 84
> > >
> > > results in
> > >
> > >  [John, hasFather, gen123]
> > >  [gen123, age, 84]
> > >  [John, hasFather, gen456]
> > >  [gen456, age, 84]
> >
> > This is what tends to happen, but we can in principle detect the
> > anonymous nodes and do more intelligent merging.
> 
> But at what point should such merging be done transparently and in a
> consistent fashion by all conformant applications?

I think that's a separate issue, regardless of our scheme. I expect it
also relies on using schema information (cardinalities etc), since we
are assuming quite without justification here that John can only have
one father! 8-)


> > John --hasFather--
> >                   |
> >                   [] --age--> 84
> >                   |
> > Jim --hasBrother--
> >
> > What's the URI of the anonymous node here? If I add more triples
> > pointing to it, then what?
> >
> > [Actually there may be a wider problem here as I don't think
> > that graph
> > can be serialized in XML RDF with an anonymous node 8-) So
> > some explicit
> > identification scheme may be needed...]
> 
> But this is really the crux of my "beef" with anonymous nodes. Even if
> the creator of the above knowledge doesn't know the "official" or "primary"
> name of that resource, it needs to give it *some* shared name, or it has
> to have a way to define equivalence between two implicit but regular
> identities of that same resource. I.e.
> 
>    [John, hasFather, x]
>    [x, age, 84]
>    [Jim, hasBrother, x]
> or
>    [John, hasFather, rdf:anonymous:(John)(hasFather)]
>    [rdf:anonymous:(John)(hasFather), age, 84]
>    [Jim, hasBrother, rdf:anonymous:(Jim)(hasBrother)]
>    [rdf:anonymous:(Jim)(hasBrother), age, 84]
>    [rdf:anonymous:(John)(hasFather),
>     daml:equivalentTo,
>     rdf:anonymous:(Jim)(hasBrother)]


That's better, but still fails if Jim has two distinct brothers (two
hasBrother triples) since the two 'anonymous' nodes will be given the
same URI. 

> The benefits of such a standardized, consistent representation for
> anonymous nodes is that (a) every application will use the same
> identity, so no system specific identifiers, (b) equivalences are
> explicit between different implicit identities of the same anonymous
> node, so inference can exploit them, (c) anonymous nodes have legal
> URIs for identity.

I agree with the benefits; I just remain to be convinced that such a
represention is possible in practice.

> > Generating a name such as "thingNextToFoo" is just a
> > convenience for this identification.
> > I do belive that 'anonymous' nodes are different to others in that the
> > name is _only_ a convenience, and could be changed at random without
> > affecting anything (in principle - provided the change is distributed
> > appropriately!).
> 
> Do you mean that the anonymous node doesn't correspond to a specific
> resource? Or just that the system-generated internal identity of that
> node could be changed? I of course fully agree with the latter.

Yes, the latter.

> > I guess the difference is that the name can be removed from any given
> > graph WITHOUT LOSS OF INFORMATION, (only loss of
> > convenience). Removing
> > any other name changes the graph, by removing information.
> 
> Unless that name is referenced somewhere.

Indeed.

Regards,

David.

-- 
/d{def}def/u{dup}d[0 -185 u 0 300 u]concat/q 5e-3 d/m{mul}d/z{A u m B u
m}d/r{rlineto}d/X -2 q 1{d/Y -2 q 2{d/A 0 d/B 0 d 64 -1 1{/f exch d/B
A/A z sub X add d B 2 m m Y add d z add 4 gt{exit}if/f 64 d}for f 64 div
setgray X Y moveto 0 q neg u 0 0 q u 0 r r r r fill/Y}for/X}for showpage
Received on Thursday, 16 August 2001 09:14:01 UTC