RE: statements about a graph (Named Graphs, reification)

[sorry, this has again become a very long mail]

Hi, Richard and Bijan!

>-----Original Message-----
>From: semantic-web-request@w3.org 
>[mailto:semantic-web-request@w3.org] On Behalf Of Bijan Parsia
>Sent: Tuesday, September 04, 2007 6:51 PM
>To: Richard Cyganiak
>Cc: Michael Schneider; K-fe bom; semantic-web@w3.org
>Subject: Re: statements about a graph (Named Graphs, reification)
>
>
>On 4 Sep 2007, at 17:30, Richard Cyganiak wrote:
>
>> Michael,
>>
>> On 4 Sep 2007, at 15:29, Michael Schneider wrote:
>>> Ok, then let's discuss more practical issues (leaving this 
>subtle RDF
>>> semantics stuff to the academic world). Until now, we had the only  
>>> usecase
>>> that someone wanted to annotate a complete RDF document,
>
>Sorry to be jumping in, but do you mean "in this thread"? 

Yes. I tried to be at least a little on-topic. ;-)

>Because other use cases are prevalent.
>
>>> which already exist
>>> somewhere having an URI. This is certainly the easiest case to  
>>> handle in
>>> practice.
>>
>> Yes. I think it's also by far the most common case.
>
>I think almost certainly not. Consider EARL:
>	http://www.w3.org/TR/EARL10-Schema/
>
>Or annotation axioms in OWL 1.1.
>
>Or Swoop Change Sets (which do chunk out, so they are a little  
>different).
>
>>> But there will probably often be the more demanding situation,
>>> where I want to make assertions about some ad hoc set of RDF  
>>> triples, which
>>> is not yet published as a special RDF document anywhere.
>>
>> To be honest, I'm not sure that this case occurs *that* much in  
>> practice.
>
>Quite often (or will). I want to record when an axiom in my owl  
>ontology has been last modified. Do I have extract that axiom and  
>publish it in a separate document?

I have been pondering about some specific szenario for quite a while now,
which I did not yet see being discussed elsewhere. And I would like to know
from you what you are thinking about it. I will try to present this scenario
in the form of a little story, because this will make things easier to
understand.

Assume there is Alice, who owns a homepage, which is enriched with some
additional RDF. One of the statements within her homepage is

    me:alice foaf:knows he:bob .

by which Alice tries to tell the world that she knows some other person Bob.

Now there is Charly, who is an old friend of both Alice and Bob. He knows,
that Alice knows Bob since 1998. Charly also owns an RDF'ed homepage, and so
he likes to make this knowledge explicit by stating something like

    "Alice knows Bob" dc:date 1998 .

Charly does not have access to Alice's homepage, so she cannot put this
statement just into Alice's triple store, or even adjust Alice's
foaf:knows-triple into some n-tuple. But even if she could, she would not
like to do this: It's actually her, who asserts this statement, so this
information should really go into her own triple store. But what she wants
to ensure in any case is that this statement is "visible" on the semantic
web. This means that if anyone (or any semantic web crawler) should stumble
over this statement, he/it should, with pretty high confidence, be able to
understand that this is really a statement which annotates Alice's
foaf:knows statement - rather than just being some arbitrary RDF triple.

Last, there is Dave. Dave has recently found Alice's homepage with her
"foaf:knows" statement within. Dave does not know Alice personally, but he
is very interested in social relationships between arbitrary people. And
more, he is interested in what others have to say about such social
relationships. :) So he wonders if there are any additional statements about
Alice's foaf:knows statement anywhere on the Semantic Web. Dave has already
installed a copy of the Semantic Web Client Library [1], so he has at least
a good chance to have access to some larger portions of the SemanticWeb
(let's suppose for a moment that we are already a few years in the future
from now, where there is already satisfying linking between existing data).
Now, what SPARQL query should he execute? He want's to find as many
assertions about the Alice's foaf:knows statement, as possible, but he also
want's to avoid too many false positives, of course.

So, this example demonstrates the scenario. There are on the one hand
parties (the Alices) which create informations on the SemWeb, encoded in
triple form. There are other parties (the Charlies) wanting to create
annotations for these triples in separated stores. These parties are
interested in having their stored annotations encoded in a searchable way.
And there are again other parties (the Daves) which like to search for such
triple annotations.

Now, the above example is a little oversimplified, I admit. But it is not
hard for me to imagine professional mashup services ("Charly 2.0" :)), which
crawl the whole Semantic Web for triple data of a specific kind (e.g. social
relationships), and then enrich this found data by additional annotations.
This will provide quite new views on the original data. For these mashup
services it will be of utmost importance that their triple annotations will
be effectively searchable. And then, there will also be general SemanticWeb
search services (the professional Daves). The value of these search services
will enhance largly for their users, if these services also take the triple
annotations of the diverse mashup services into account.

So, there are two questions here, which turn out to be closely related:

  * How should triple annotations be encoded on the public Semenatic Web, so
that they can easily be detected, and identified to really be triple
annotations?

  * How should queries for triple annotations look like in the Semantic Web?

First, it is clear that if Charly uses some special custom method to encode
her triple annotations, there will be no realistic perspective that her data
will be found. "Custom reification" methods can be completely resonable for
being used within specific applications, or for closed user groups. But for
a searcher like Dave, who wants to broadly query the whole SemanticWeb for
data created by possibly lots of different, unknown, and unrelated parties,
this is certainly not an option. But even, if Dave really is going to
include specialized encoding schemes into his query, then this will only be
the published schemes of very important parties. So no hope then for Charly
(and many other normal users or "small players" in the Semantic Web) to get
their data being found.

So what will happen in such a situation? If no standard encoding scheme
already exists, there will probably emerge a few encoding schemes, rapidly
introduced by some first-to-marked organisations (simply because these orgs
need such a scheme AFAP), and everyone else will then use these few schemes.
And after some years of usage, the W3C would step in making a standard based
on those encoding schemes which have survived until then.

But in the case of RDF, I think that people will rather adopt RDF
reification, for several reasons:

  * It's already there, ready for use, and it's part of the official RDF
standard.

  * It is just more triple data, so it can simply be put into the existing
triple stores. And every RDF aware software out there will be able to handle
this kind of data out of the box.

  * It seems reasonably easy to understand and use for non-expert people (I
have experienced this, when I tried to explain RDF reification to a complete
RDF novice).

  * There is existing tool support (like in Topbraid Composer [2])

  * At least in the beginning, Charly will probably think: "Well, whoever
will search for triple annotations, he will certainly at least come to the
idea to search for rdf:Statements. I don't have any clue for what else he
will search, so I use RDF reification for my encoding. This will be the
savest path, if any." I would call this argumentation a "maximum likelyhood
estimation". :)

  * And Dave will think: "Well, at least I should search for rdf:Statements,
because this will be the nearest people will think of, when they encode
their triple annotations." Again some maximum likelihood estimation. 

And an according SPARQL query is pretty simple:

      construct { $stmt $p $o }
      where { $stmt a rdf:Statement; rdf:subject me:Alice; rdf:predicate
foaf:knows; rdf:object he:bob . }

Well, not nice, but it works for Dave, and that is the important point.

And anticipating one of the most likely objections to my argument: I don't
believe that anyone of the "ordenary semantic web users" out there, who is
actually interested in putting triple annotations into the SemWeb or
searching for them, will really be interested in debates about
"non-existing" or "broken semantics" of RDF reification. I, personally, like
such debates, but this is in the end just ivory tower bosh. So I won't
bother these people with questions like: "Hey, don't you know that talking
about the insertion date of a triple into an RDF store is something
semantically completely different, than talking about the date since Alice
knows Bob?" These people do not need the academic world to provide them
lessons in philosophy. :) What they really need from the academic community
is a pragmatic tool, which serve their needs, so they can start to do their
most important job: Filling the SemWeb with content! And RDF reification
actually provides such a tool, when it is only regarded as a common
vocabulary, which makes it technically possible to associate an URI to some
RDF triple. (Sorry, this paragraph has gone a little flamy, but I really
couldn't resist. ;-))

The third candidate is NamedGraphs. But in order to estimate if this
approach can be used for the above scenario, I need to know more about it.
This was the reason why I asked in my last mail "How do named graph data get
published into the Semantic Web?". If it is (with reasonabe effort) possible
for instance to search for the URIs of all NamedGraphs of the form

     :g { me:alice foaf:knows he:bob }

then NamedGraphs work equally well like Reification for this purpose,
because I can then, in a second step, query for all those triples in the
SemWeb, which have the found NamedGraph's URI as their subject. And
NamedGraphs would bring this big advantage with them that they can talk
about more than a single triple (though I have difficulties to see what this
serves me in my usecase above. Perhaps other people will be able to find an
example, where searching for annotated "multi-triples" would really make
sense).

But, we must not conceil that NamedGraphs have a very bad disadvantage in
comparison with Reification, anyway: NamedGraphs are not a standard. And if
this approach does not get into RDF, or at least into common use, very soon,
it will possibly lose its chance to become a player at least in the above
scenario. 

/This/ will of course only be a topic /if/ the above scenario is relevant at
all. Because my whole argumentation pro RDF reification depends on the
estimation, that the above scenario is a really relevant usecase (of course
with mashup and search services instead of Charlies and Daves :)). If this
is not the case, then I won't speak for RDF reification any longer, because
I then see no real use for it anymore. (At least, until another scenario
comes to my mind ;-)).

So what do you think?


Cheers,
Michael

[1] http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/semwebclient/
[2] http://www.topbraidcomposer.com/

--
Dipl.-Inform. Michael Schneider
FZI Forschungszentrum Informatik Karlsruhe
Abtl. Information Process Engineering (IPE)
Tel  : +49-721-9654-726
Fax  : +49-721-9654-727
Email: Michael.Schneider@fzi.de
Web  : http://www.fzi.de/ipe/eng/mitarbeiter.php?id=555

FZI Forschungszentrum Informatik an der Universität Karlsruhe
Haid-und-Neu-Str. 10-14, D-76131 Karlsruhe
Tel.: +49-721-9654-0, Fax: +49-721-9654-959
Stiftung des bürgerlichen Rechts
Az: 14-0563.1 Regierungspräsidium Karlsruhe
Vorstand: Rüdiger Dillmann, Michael Flor, Jivka Ovtcharova, Rudi Studer
Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus

Received on Wednesday, 5 September 2007 09:47:18 UTC