Re: Observations (named graphs, blank node closurs) from Seaborne, Andy on 2005-02-21 (public-rdf-dawg@w3.org from January to March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 21 Feb 2005 14:02:40 +0000
To: Giovanni Tummarello <giovanni@wup.it>
CC: public-rdf-dawg@w3.org
Message-ID: <4219EA00.3090905@hp.com>
Giovanni Tummarello wrote:
> Hello all , i have just seen the announcement for the working draft of 
> sparql and decided to take a look at what is naturally likely to be one 
> of the most used pieces of the SW.
> 
> I have a few observation, hoping they can benefit:
> 
> a) Supporting something much better than named graphs.
> 
> No matter how starkly and arbitrarely HP likes to state its papers that 
> "the semantic web is a collection of namable RDF graphs", the truth is 
> different: there is no consensus on this. I therefore cant understand 
> such wide support for this construct.

The named graph paper you reference has 4 authors with 4 separate affiliations, 
2 commercial, 2 academic.

> 
> Another way of seeing it might as well be that the semantic web is made 
> by the web of URIs and statements that are made about them by anyone.  
> In fact, truth is RDF is defined monotonic so that in theory all could 
> be merged.
> Should all be merged? yes.. as long as "context" is somehow preserved 
> about statements. So, thanks to -application define concept of  
> context-  to answer -application specific reqirements-. , one could 
> later take a later decision about what to consider
> Name graph are *one* way to provide context. and clearly a non standard 
> one. Others have proposed quadruples, others quituples, why not 
> supporting them as well?

The RDF dataset approach is the one the WG decided on.  Several members of the 
WG have deployed systems that contrinbuted ideas to the discussuion as well 
seeing what other things outside the working group had addressed various aspects 
of this area.

The working draft does not discuss "trust" or "context" (those two words don't 
appear in the text).

We hope it will enable such solutions to be built.

> 
> Context could be  a "certainty" fuzzy value attached to a statement 
> (select just the statements that are more than 90% certain). or the date 
> the statement was ever issued, or the name of the person who first made 
> the statements (e.g giovanni is an alien)  independently from which 
> graph now the statements belongs to..
> 
> it is obviously useless to list them all: they're contexts, they're 
> applicaiton specific, they all might be useful and should be supported.
> Now.. should sparql specification be flooded with specific syntactic 
> constructs to support them all? obviously not.
> But it might be a start to support the only official constructs to talk 
> about triples and that is.. reification, which is not even mentioned in 
> the current draft.

There is nothing in the current draft about reification because it is naturally 
supported and needs no additional features in the query language.  If you have 
some concrete examples of where reification needs support please coudl you 
provide them.  See also below.

I woudl also observe that reification is not universally seen as the way to 
handle this.  Jena has implemented speciifc helper support for reification and 
goes to some lengths to be correct RDF and have compact DB storage.  Yet the 
user feedback seems to indicate that a systems scale, the unit of "context" is 
not the RDF statement - it become unwieldy.

> 
> Wouldnt it be fairly simple to add an automatic binding to the 
> "statement" as a 4th node for each triple and this to bind this to the 
> reification node/s?

In a later message you say that this is reification so I'm reading this as saying

(?g dc:publisher ?name ?triplecontext)

being:

( ?triplecontext  rdf:type      rdf:Statement)
( ?triplecontext  rdf:subject   ?g )
( ?triplecontext  rdf:predicate dc:publisher )
( ?triplecontext  rdf:object    ?name )

Is that correct?  In particular, there is no presumption that any dc:publisher 
triple is actually asserted.

> 
> Example 
> 
> SELECT ?name ?mbox ?date WHERE
>        (?g dc:publisher ?name ?triplecontext)
>        (?g dc:date ?date )
>        (?triplecontext  fuzzyont:certainty ?fuzzyval) and ?fuzzyval <0.8
> 
> with ?triplecontext binding to the reification node of the said triple. 
> (if any) 
> If you really like named graphs then ..  the GRAPH construct simply becomes:
> 
> SELECT ?name ?mbox ?date WHERE
>        (?g dc:publisher ?name ?triplecontext)
>        (?g dc:date ?date )
>        (?triplecontext  namedGraphs:belongsto 
> "http://example.com/mynamed.rdf")
> 
> Another way to expressi this syntactically could be with a 
> reificationnode(statement) function or a binding say
> 
> SELECT ?name ?mbox ?date WHERE
>        ?A(?g dc:publisher ?name)
>        (?g dc:date ?date )
>        (?A  namedGraphs:belongsto "http://example.com/mynamed.rdf")
> 
> nice? :-)
> 
> or with a function reificationnode(s o p) ..

So, as I understand it, what you suggest is reification syntax support.

> 
> My impression is that this would cut a large number of pages in the 
> specifications (all the construct specifically devoted to named graph) 
> AND allow the context models mentioned above. ..
> (side node: yes .. so many triples..  but its just a factor say K .. and 
> this is just when serializing (assuming amore efficent serialization 
> cant be thought.. which is false) when inside a DB obvsiouly the context 
> would be coded in an efficent way)
> 
> While probably a good start, supporting a useful context construct 
> probabl requires more, which leads to the second point.
> 
> b) I see there is some support for something similar to the CBD.

It provides a building block - it does not provide CBD or any similar scheme. As 
your discussion below shows there are several different schemes - and it seems 
to me that the choice is application dependent, not universal.

One approach to this is via the service description where the service can state 
what algoroithms it may apply to DESCRIBE results in which circumstances.

>   This 
> seems a very goode idea. CBD are bound to become very useful. But please 
> i suggest the support for  very useful  subset of the CBD that we call 
> MSG in [1], a Minimum Self Contained Graph. Basically is a blank node 
> closure on a given starting statement (not a node). a CBD is simply the 
> union of all the MSGs involving a starting URI (see also [3] for a 
> complete discussion) . MSG are important becouse of the decomposition 
> properties they have (See the paper for some theory) and becouse they 
> reprpesent the minimum information contribution that can be passed from 
> a peer to another in a distributed system.  In our case we use the MSG 
> theory to support context information without the need for reification 
> of each statement, and in turn we use this context node (a reification 
> on any arbitrary triple of the msg) to  provide a digital signature 
> INSIDE the rdf model  so that the provenence of each statement can be 
> tracked without the need for named graphs.
> Note that this has been said to be impossible in [2] "As discussed in 
> [X], it is necessary to keep
> the graph that has been signed distinct from the signature, and other 
> metadata concerning the signing,
> make about which information to trust.", where X is the Carroll 
> serialization paper... which doesnt make that claim (that i know of).
> 
> Anyway.. msg support could come in a way of selecting statements. which 
> would then require some operators to work with sets..
> Checking a context in the model as highlighted by he paper would be 
> simple, given the abiliy to deal with statements set (a IN operator?)
> SELECT ?name ?mbox ?date WHERE
>        ?A(?g dc:publisher ?name)
>        (?g dc:date ?date )
>        where (?x namedGraphs:belongsto "http://example.com/mynamed.rdf") 
> IN msg(?A)
> 
> To conclude,  i get the impression it would be benificial to clearly 
> define the support for named graphs in sparql extension.
> RDF has been given  resource centric APIs, statement centric, ontology 
> centric. its all ok, its all according to the consensus and the 
> reccomendations .
> But making what is basically a "file centric" approach such a 
> fundamental part of the QL seems primitive at least? please someone 
> convince me of the contrary :-)
> 
> Thanks for the attention, please note that i am posting after reading  a 
> few thread in the ML .. but certainly not all, please apologizes if i am 
> disregarding some major post, i'd be happy to know about.
> Giovanni
> 
> [1] http://giovanni.ea.unian.it/temp/WWW2005_signignRDF.pdf
> [2] Jeremy Carroll, Christian Bizer, Patrick Hayes, Patrick Stickler: 
> Named Graphs, Provenance and Trust 
> <../../bizer/pub/Carroll_etall-WWW2005.pdf> at The Fourteenth 
> International World Wide Web Conference (WWW2005), Chiba, Japan, May 2005.
> [3] http://giovanni.ea.unian.it/temp/RDFGROWth_workshopISWC2004.pdf
> Toward widely deployable Semantic Web P2P: tools, definitions and the 
> RDFGrowth algorithm
> Giovanni Tummarello, Christian Morbidoni, Joakim Petersson, Francesco 
> Piazza, Mauro Mazzieri, Paolo Puliti
> 
> 

	Andy
Received on Monday, 21 February 2005 14:05:13 UTC