Re: In RDF what is the best practice to represent data provenance (source)?

Richard Cyganiak wrote on Sun, 21 Jan 2007:

> Michael,
> 
> You say there is a distinction between "atomic resources" in a  
> domain, and relationships between them. Such a distinction is  
> artificial. The "atomic resources" in reality are quite literally not  
> atomic, and if you squint the right way, any relationship can be seen  
> as a resource of its own. The distinction is just an artifact of your  
> modelling.

I am not sure if I correctly understand what you mean by "artificial"
here, so please correct me, if I miss the point.

I have used the term "resource" here in the same sense as it is used in
the RDF Semantics spec, which calls everything a "resource", what
"exists" within the currently regarded domain: things, sets, relations,
relationships, etc. So, in this regard, any relationship can of course 
be seen as a resource, simply from definition.

This does not mean, however, that a clear categorical distinction
between things, relationships, classes, etc. does not exist. Such
a distinction is an inherent property of the respectively domain, by
which I interpret my RDF graph. Say, I have the following RDF graph

   G := { nat:three rdf:type nat:PrimeNumbers . }

and use the natural numbers as the interpreting domain. If 'nat:three'
denotes the natural number "3", and 'nat:PrimeNumers' denotes the set of
prime numbers, I use a meaningful interpretation of G, (based on the
semantics of URI 'rdf:type', which is defined by the RDF spec). But if
'nat:three' denotes the primes, and 'nat:PrimeNumbers' denotes "3", then
the interpretation for this graph gets /meaningless/!

This is so, because there is a distinction in the domain of natural 
numbers between thing-like resources and set-like resources. It is /not 
me/, who introduces such a categorical distinction,
when making assertions about some domain. The best I can do is to
capture and exploit those distinctions by means, which my used
language (RDF here) provides. The more means my language provides (the
more expressive it is), the more accurately I am able to model aspects
of the given domain.

> So, I agree with ChrisR: If you feel the need to make statements  
> about relationships, then maybe the modelling is not adequate to your  
> use case, and the relationship ought to be turned into a resource of  
> its own. 

The relationship already /is/ a resource of its own. But I want my used
language, RDF, to provide me some means to not just talk about a
relationship as a /general/ resource, but more specifically as a
/relationship-like/ resource, so that I can refer to its further 
structural aspects (refer to its subject, predicate and object),
whenever I want.

The proposal of ChrisR is probably the best one available in
the current situation: If I know that I need to make assertions about
some relationships, I create a dedicated class R, which is meant to
contain all those relationships that I am interested in. This works,
as long as I remember, that the semantics of such a construct is up to
myself, or up to my application, which processes RDF containing such
constructs.

Then, all comes down to the question, if we always want to create our
specialized treatment of relationships on a case-by-case base, or if we
want to have a general, reusable way to talk about such relationships
(perhaps supported by reasoners and tools). I would opt for the latter.

To illustrate the problem, let's swap the roles and assume for a moment,
that the following strange situation holds: There is no explicit support
for /classes/ within RDF: No special vocabulary, like 'rdf:type', no
special syntactical constructs, and, most important, no special
semantics. Let's forget about RDFS and OWL for now, we just regard basic
RDF here. Having all this not would have the following immediate
consequences:

   * no RDF collections (rdf:Bag, etc.)
   * no RDF reification

What would remain were the ability to create all kinds of triple sets,
where the subject of each triple would always denote some resource,
the predicate would denote some relation or attribute, and the object
would denote some resource or datavalue. So, not much lost, from a pure
RDF point of view!

Now, one day, I, Michael, read a post by you, Richard, where you
complain, that you do not just want to describe resources by their
attributes and by their relationships to other resources, but you also
would like to make assertions about, what classes a given resource
belongs to. "For example", you ask, "how can I express that the natural
number "3" is an instance of the set of all prime numbers."

I think a little about this point, and then I answer: "Well, because
numbers and sets of numbers are both homogeneously regarded as resources
in RDF, you can just put the following triple in your RDF graph:

   nat:three :instanceOfNaturalNumberSet :PrimeNumbers .

There will be no problem to find a meaningful interpretation for this
statement (which also makes it true). You, and everyone else, who is
going to use this RDF graph, just have to always remember, how the
property ':instanceOfNaturalNumberSet' is meant to be interpreted, that
its subject always has to denote some natural number, and that its
object always has to denote some set of numbers."

Now, how would you feel about such a reply? Is there a formal error in
my argumentation? I do not see any. Nevertheless, this proposal would
probably sound pretty peculiar, I suppose. Your answer would certainly 
be, why does RDF not just directly support this categorical distinction 
between classes and instances, so that everyone can reuse it, instead of
creating custom properties on a case-by-case base. Your reasoning would be:

   1) this is a very fundamental distinction

   2) there are tons of usecases where this distinction is needed

But, I would, of course, answer, that this would make RDF more complex,
would probably lead to unforeseeable problems, and so on... ;-)

A similar discussion could also be done for

   * n-ary relationships (currently no direct support,
     so we need a convention like the one given in the n-ary BP note)

   * Named Graphs

For representing the content of an RDF graph G, for example, why not
just take some dedicated class C_G (now, we have classes back again in
RDF!), and interpret all its instances as the triples contained in G?
For this, we would of course have to assume in each case, that triples
and graphs are part of the respectively interpreting domain. The "Named
Graph" proposal just makes this assumption explicit, by providing an
extension of RDF semantics: The syntactical elements are now always part
of the interpreting domain (the name of a named graph just denote that 
named graph)! One could argue, that this would not be such a big win... 
But you would probably oppose such an argument, and so would I, too.

> Some related advice is found in [1].
> 
> [1] http://www.w3.org/TR/2004/WD-swbp-n-aryRelations-20040721/

(Current version is: http://www.w3.org/TR/swbp-n-aryRelations/)

You probably mean this "General Issue":

   Issue 1: If property instances can link only two individuals,
   how do we deal with cases where we need to describe the
   instances of relations, such as its certainty, strength, etc?

I will not going any deeper in this, because my point of view is 
hopefully clear now: I would prefer to see direct support in RDF for 
making assertions about relationships, rather than using any conventions 
about how to interpret specially formed RDF expressions.

So I would like to ask the community the following questions:

    * Do you think that we need direct support in RDF for
      making assertions about relationships - in form of
      extra vocabulary, language constructs and semantics?
      Or do you think that for the Semantic Web it will suffice
      to define specifically interpreted relationship classes
      on a case-by-case scenario?

    * Are there any serious general problems with this complete
      idea of direct language support for referencing relationships?
      (as a comparison: I do not easily see how one could directly
      support n-ary relationships within the RDF framework,
      in whatever way)

    * Do you see any serious problems in re-interpreting /reification/
      for this purpose?


Best regards,
Michael

Received on Monday, 22 January 2007 14:10:48 UTC