Observations (named graphs, blank node closurs) from Giovanni Tummarello on 2005-02-19 (public-rdf-dawg@w3.org from January to March 2005)

From: Giovanni Tummarello <giovanni@wup.it>
Date: Sat, 19 Feb 2005 17:38:29 +0100
To: public-rdf-dawg@w3.org
Message-ID: <42176B85.7040702@wup.it>
Hello all , i have just seen the announcement for the working draft of 
sparql and decided to take a look at what is naturally likely to be one 
of the most used pieces of the SW.

I have a few observation, hoping they can benefit:

a) Supporting something much better than named graphs.

No matter how starkly and arbitrarely HP likes to state its papers that 
"the semantic web is a collection of namable RDF graphs", the truth is 
different: there is no consensus on this. I therefore cant understand 
such wide support for this construct.

Another way of seeing it might as well be that the semantic web is made 
by the web of URIs and statements that are made about them by anyone.  
In fact, truth is RDF is defined monotonic so that in theory all could 
be merged.
Should all be merged? yes.. as long as "context" is somehow preserved 
about statements. So, thanks to -application define concept of  
context-  to answer -application specific reqirements-. , one could 
later take a later decision about what to consider
Name graph are *one* way to provide context. and clearly a non standard 
one. Others have proposed quadruples, others quituples, why not 
supporting them as well?

Context could be  a "certainty" fuzzy value attached to a statement 
(select just the statements that are more than 90% certain). or the date 
the statement was ever issued, or the name of the person who first made 
the statements (e.g giovanni is an alien)  independently from which 
graph now the statements belongs to..

it is obviously useless to list them all: they're contexts, they're 
applicaiton specific, they all might be useful and should be supported.
Now.. should sparql specification be flooded with specific syntactic 
constructs to support them all? obviously not.
But it might be a start to support the only official constructs to talk 
about triples and that is.. reification, which is not even mentioned in 
the current draft.

Wouldnt it be fairly simple to add an automatic binding to the 
"statement" as a 4th node for each triple and this to bind this to the 
reification node/s?

Example 

SELECT ?name ?mbox ?date WHERE
       (?g dc:publisher ?name ?triplecontext)
       (?g dc:date ?date )
       (?triplecontext  fuzzyont:certainty ?fuzzyval) and ?fuzzyval <0.8

with ?triplecontext binding to the reification node of the said triple. 
(if any) 
 
If you really like named graphs then ..  the GRAPH construct simply becomes:

SELECT ?name ?mbox ?date WHERE
       (?g dc:publisher ?name ?triplecontext)
       (?g dc:date ?date )
       (?triplecontext  namedGraphs:belongsto 
"http://example.com/mynamed.rdf")

Another way to expressi this syntactically could be with a 
reificationnode(statement) function or a binding say

SELECT ?name ?mbox ?date WHERE
       ?A(?g dc:publisher ?name)
       (?g dc:date ?date )
       (?A  namedGraphs:belongsto "http://example.com/mynamed.rdf")

nice? :-)

or with a function reificationnode(s o p) ..

My impression is that this would cut a large number of pages in the 
specifications (all the construct specifically devoted to named graph) 
AND allow the context models mentioned above. ..
(side node: yes .. so many triples..  but its just a factor say K .. and 
this is just when serializing (assuming amore efficent serialization 
cant be thought.. which is false) when inside a DB obvsiouly the context 
would be coded in an efficent way)

While probably a good start, supporting a useful context construct 
probabl requires more, which leads to the second point.

b) I see there is some support for something similar to the CBD.   This 
seems a very goode idea. CBD are bound to become very useful. But please 
i suggest the support for  very useful  subset of the CBD that we call 
MSG in [1], a Minimum Self Contained Graph. Basically is a blank node 
closure on a given starting statement (not a node). a CBD is simply the 
union of all the MSGs involving a starting URI (see also [3] for a 
complete discussion) . MSG are important becouse of the decomposition 
properties they have (See the paper for some theory) and becouse they 
reprpesent the minimum information contribution that can be passed from 
a peer to another in a distributed system.  In our case we use the MSG 
theory to support context information without the need for reification 
of each statement, and in turn we use this context node (a reification 
on any arbitrary triple of the msg) to  provide a digital signature 
INSIDE the rdf model  so that the provenence of each statement can be 
tracked without the need for named graphs.
Note that this has been said to be impossible in [2] "As discussed in 
[X], it is necessary to keep
the graph that has been signed distinct from the signature, and other 
metadata concerning the signing,
make about which information to trust.", where X is the Carroll 
serialization paper... which doesnt make that claim (that i know of).

Anyway.. msg support could come in a way of selecting statements. which 
would then require some operators to work with sets..
Checking a context in the model as highlighted by he paper would be 
simple, given the abiliy to deal with statements set (a IN operator?)
SELECT ?name ?mbox ?date WHERE
       ?A(?g dc:publisher ?name)
       (?g dc:date ?date )
       where (?x namedGraphs:belongsto "http://example.com/mynamed.rdf") 
IN msg(?A)

To conclude,  i get the impression it would be benificial to clearly 
define the support for named graphs in sparql extension.
RDF has been given  resource centric APIs, statement centric, ontology 
centric. its all ok, its all according to the consensus and the 
reccomendations .
But making what is basically a "file centric" approach such a 
fundamental part of the QL seems primitive at least? please someone 
convince me of the contrary :-)

Thanks for the attention, please note that i am posting after reading  a 
few thread in the ML .. but certainly not all, please apologizes if i am 
disregarding some major post, i'd be happy to know about.
Giovanni

[1] http://giovanni.ea.unian.it/temp/WWW2005_signignRDF.pdf
[2] Jeremy Carroll, Christian Bizer, Patrick Hayes, Patrick Stickler: 
Named Graphs, Provenance and Trust 
<../../bizer/pub/Carroll_etall-WWW2005.pdf> at The Fourteenth 
International World Wide Web Conference (WWW2005), Chiba, Japan, May 2005.
[3] http://giovanni.ea.unian.it/temp/RDFGROWth_workshopISWC2004.pdf
Toward widely deployable Semantic Web P2P: tools, definitions and the 
RDFGrowth algorithm
Giovanni Tummarello, Christian Morbidoni, Joakim Petersson, Francesco 
Piazza, Mauro Mazzieri, Paolo Puliti
Received on Saturday, 19 February 2005 16:39:13 UTC