Re: Analysis of Anon Resources (long)

Brian,

This was an interesting piece you submitted.  At this time, I'm going to 
nit-pick and comment on some of the points you raise.  I'm not sure if I'm 
ready to give a view on which approach I prefer.

--

At 08:21 AM 9/8/00 +0100, McBride, Brian wrote:
>It has been suggested recently that there are four 'models' to
>consider when we discuss RDF;
>
>   o the abstact model, sometimes called the data model, or
>     just the model.
>   o the graphical model
>   o the triple model
>   o the xml serialization

Hmmm... I thought the triple model was embedded in the abstract model.  Or 
are you referring to the serialization of triples as text?

>An RDF model is a directed graph.  It contains nodes
>connected by directed arcs, i.e. arcs that have a specific
>source and destination node.  The source node of an arc must
>come from a set I will here call R.  The destination node
>of an arc must come from either the set called R or the set
>called Literals.  Arcs are always labelled with a URI.

I was initially confused by this:  the use of "graph" to describe a 
mathematical abstraction, and "graphical" to describe a 
representation.  for the purposes of this discussion, maybe "abstract 
graph" or "abstract directed graph" would help to reinforce the difference?

>The issue at question here is whether all members of the set
>R have a URI.

Generally I like that as an issue statement.  Maybe "(and optional fragment 
ID)" should be added?

>The model and syntax specification is at best unclear and at
>worst inconsistent on this question.  Section 2.1 states
>"Resources are always named by URIs plus optional anchor
>ids".  Section 2.1.1 has a graphical representation of a
>model with no URI and there are frequent references to
>anonymous resources throughout the text.
>
>It is therefore futile to try to resolve this question by
>referring to individual portions of the specification.

Good point!

>How then, can this question be answered.  There seem to me
>to be the following options:
>
>   o we can consider the spec as a whole, identify which
>     parts we think were unclear and reinterpret them to
>     create a consistent interpretation.
>
>   o we could ask the original authors what they meant and
>     whether they still think that's right.

We should probably do that anyway.

>   o we could come to an independent resolution of what
>     would be best.
>


>Some possible solutions
>
>   o  All members of the set R must be given a URI by an
>      application or parser
>
>   o  Remove anonymous resources from the serialization
>      - they were a mistake
>
>   o  Invent a new class of URI, not URL's, not URN's
>      but a scoped resource name.
>
>   o  Some members of the set R do not have a URI.

And there's Dan B's approach:

   ... So while I believe the RDF model says that nodes have URIs, that
   doesn't automagically get us into a situation whereby each RDF
   processor/database always *knows* the URI for every node it has some
   representation of. Sometimes this information costs money, for example.

To proceed, I'd like to be clear if RDF resources are _always_ web 
resources.  If so, this begs the supplementary question of the relationship 
between URIs and web resources.  If not, then how are fragments IDs 
justified?  Also, can a single RDF resource have multiple RDF identifiers?

>All Members of R are Given a URI
>================================

[...]

>Such generated URI's cannot reasonably be thought of as
>URL's - they are not locators.  They must therefore be
>URN's and there are some strict requirements on the
>behaviour of URN's i.e. they persist and the same URN
>must never be used to represent two different resources
>even over time.

I think there's an incorrect assumption in this:  that a URI, if not a URL, 
must be a URN.  I understand there can be URIs that are neither.  So the 
URN persistence requirement is not an issue.

[...]

>These difficulties can be surmounted if there is a way for
>a serialization to specify some key or base URI that will
>be used in the generation of anonymous URI's.  This could
>be accomplished without changing the current syntax by
>introducing a processing instruction.

Or just use xml:base?

>Another desireable feature (requirement?) is that the
>URI's generated should not change under some transformations
>of the serialization.  For example, if the ordering of the
>statements in a serialization were changed in way that
>should not change the model being represented, then
>the URI's generated by the parser should not change.

I can imagine that this not be a requirement, if RDF resources are allowed 
to have multiple RDF identifiers.  Different serializations would just 
result in different identifiers being used.  Some notion of model 
equivalence under substitution of equivalent identifiers might be introduced.

But it would be nice if trivial serialization differences don't cause model 
differences.

[...]

>Create a New Class of URI with Different Rules
>==============================================
>
>i.e. bend the definition of URI as its getting in the
>way.  This may be what DanB had in mind when he
>suggested "var:..." format URI's a few months back.
>
>var format URI's relax the URN constraint on persistence.
>Within some scope, the definition of which is outside
>the understanding of RDF processors, these URI's behave
>like URN's.  It is upto the user, or his operations
>folks, that they manage the use of RDF and RDF processors
>so that two uses of the same var to represent different
>resources never meet.

Hmmm...  if anonymity is to be allowed, should it be a property of the 
identifier or a property of the model?

Although Dan's scheme has some attraction, my intuition is that this would 
stretch URI space concepts in a way that could be very harmful.  Keeping 
the concept within the RDF model seems safer to me.

>Regard Anonymous Resources as a Mistake and Remove Them
>=======================================================
>This approach forces the generation of URI's back to
>the generator of the RDF.  This generator should have
>enough application knowledge to generate URI's that
>really are URI's.
>
>This approach will presumably break some of the RDF that
>is out there.  How big a problem this is, I don't know.
>If it would cause a problem for you, raise your hand now.

I think I'd like to avoid this if there's a better way.


>Some Members of R do not have a URI
>===================================

I think this is what out prototype implementations do:  null URI == no 
URI.  There are internal identifiers that are used to tie the model together.

I'm not saying (yet) that this is what I favour -- just commenting that it 
seems to be workable.

>This permits models to be constructed with nodes that
>do not have a globally unique identifier.  There
>are entities, which we might want to represent in an RDF
>model which do not have a 'natural' URI.  Me for example.
>There are entities which an application designer may prefer
>not to give a name to - e.g. compound values such as my
>weight.
>
>Applications are no longer forced to construct artificial
>URI's for entities which have no natural URI's.

This in turn implies that applications must have a (private) means to 
represent relationships involving anonymous resources...

>The XML serialization syntax and the graphical presentation
>have defined means for representing nodes with no URI.  M&S
>gives no formal description of a language for representing
>triples, but it does include examples where anonymous nodes
>are represented in a triple notation.  It is clear that it
>is possible to design a representation of triples which can
>distinguish between URI's and other names with a more limited
>scope.

More generally, I think the issue is that different serializations must 
define mechanisms for dealing with anonymous resources.

The problem I see is (a point you raise) that at least one of the current 
serializations (RDF M&S XML) has only limited capabilities for expressing 
such relationships:  arbitrary anonymous-resource relationships cannot be 
expressed.

This in turn leads us to question:  are all serialization formats 
equivalent.  In the face of anonymous resources, I think the answer is "no".

>A key point to note is that there are graphs with anonymous
>nodes that cannot be represented in the current XML syntax.
>To ensure equivalence between the abstract model and the
>syntax, they syntax must either be extended, or the use of
>abstract nodes in a model constrained.

... or we accept that the syntax is not necessarily complete w.r.t. the 
abstract model.

>An RDF processor can track the identity of an anonymous
>resources whilst it remains within the processing scope of
>that processor.  But if such a resource moves out of the
>scope of the processor and comes back in, the processor has
>no way to know it is the same resource.

Yes, in general.  I think that the nature of anonymous resources is that 
their meaning is defined within the scope of a given set of statements, so 
name equivalence may not be important.

>So for example, consider an implementation of an Rdf model,
>i.e. a collection of statments, which contain references
>to anonymous resources.  If such a model were written as
>and XML serialization to a file in a way that preserved the
>anonimity and then read back in again, the processor would
>have no way to tell that the anonymous resources written out
>were the same as the anonymous resources being read back in.
>Take a simple example.  A model with resource representing me,
>a property linking me to an anonymous resource representing my
>weight.  Write this out to a file, read it back in and add it
>to the same model, I end up with two weight properties.  Not
>great.

Maybe that's just fine.  Whether it is or not depends upon how an 
application interprets them.

An application might allow:

    [person] --weight--> [ ] --value--> "12st"
    [      ]
    [      ] --weight--> [ ] --value--> "12st"

to mean the same as just:

    [person] --weight--> [ ] --value--> "12st"

It might even allow that:

    [person] --weight--> [ ] --value--> "12st"
    [      ]
    [      ] --weight--> [ ] --value--> "14st"

is reasonable, in that at some time the person had a weight of 12st, and at 
another time of 14st.#

Or, using interpretation properties 
(http://www.w3.org/DesignIssues/InterpretationProperties), a statement like:

    [person] --weight--> [ ] --st--> "12"
    [      ]
    [      ] --weight--> [ ] --lb--> "168"

the same as:

    [person] --weight--> [ ] --st--> "12"
                         [ ] --lb--> "168"

I think we may be starting to face the "open world" nature of RDF.  There 
is no universally defined framework of interpretation for RDF statements, 
so it is not possible to guarantee that every legitimate RDF statement 
makes sense, or that all shades of meaning are preserved through all 
possible syntactic manipulations (*).   There may be enclaves in the RDF 
universe where additional rules apply to achieve the level of consistency 
needed by some given application.

(*) shaky argument -- treat with scepticism!

A possible aspect of DanB's "don't know" approach is that the "don't know" 
can be extended to certain interpretations involving anonymous 
resources.   This leads us to maybe not knowing whether:

    { [person], [weight], [] }
    { [], [value], "12st" }
    { [person], [weight], [] }
    { [], [value], "12st" }

means:
    [person] --weight--> [ ] --value--> "12st"
    [      ]
    [      ] --weight--> [ ] --value--> "12st"

or:
    [person] --weight--> [ ] --value--> "12st"

Similarly, possibly not knowing if:
    { [Harry], [friend], [] }
    { [], [value], "Tom" }
    { [Harry], [friend], [] }
    { [], [value], "Dick" }

means:
    [Harry] --friend--> [ ] --value--> "Tom"
    [     ]
    [     ] --friend--> [ ] --value--> "Dick"

or:
    [Harry] --friend--> [ ] --value--> "Tom"
                        [ ] --value--> "Dick"

#g

------------
Graham Klyne
(GK@ACM.ORG)

Received on Monday, 11 September 2000 14:50:46 UTC