- From: Giovanni Tummarello <giovanni@wup.it>
- Date: Sat, 19 Feb 2005 17:38:29 +0100
- To: public-rdf-dawg@w3.org
Hello all , i have just seen the announcement for the working draft of
sparql and decided to take a look at what is naturally likely to be one
of the most used pieces of the SW.
I have a few observation, hoping they can benefit:
a) Supporting something much better than named graphs.
No matter how starkly and arbitrarely HP likes to state its papers that
"the semantic web is a collection of namable RDF graphs", the truth is
different: there is no consensus on this. I therefore cant understand
such wide support for this construct.
Another way of seeing it might as well be that the semantic web is made
by the web of URIs and statements that are made about them by anyone.
In fact, truth is RDF is defined monotonic so that in theory all could
be merged.
Should all be merged? yes.. as long as "context" is somehow preserved
about statements. So, thanks to -application define concept of
context- to answer -application specific reqirements-. , one could
later take a later decision about what to consider
Name graph are *one* way to provide context. and clearly a non standard
one. Others have proposed quadruples, others quituples, why not
supporting them as well?
Context could be a "certainty" fuzzy value attached to a statement
(select just the statements that are more than 90% certain). or the date
the statement was ever issued, or the name of the person who first made
the statements (e.g giovanni is an alien) independently from which
graph now the statements belongs to..
it is obviously useless to list them all: they're contexts, they're
applicaiton specific, they all might be useful and should be supported.
Now.. should sparql specification be flooded with specific syntactic
constructs to support them all? obviously not.
But it might be a start to support the only official constructs to talk
about triples and that is.. reification, which is not even mentioned in
the current draft.
Wouldnt it be fairly simple to add an automatic binding to the
"statement" as a 4th node for each triple and this to bind this to the
reification node/s?
Example
SELECT ?name ?mbox ?date WHERE
(?g dc:publisher ?name ?triplecontext)
(?g dc:date ?date )
(?triplecontext fuzzyont:certainty ?fuzzyval) and ?fuzzyval <0.8
with ?triplecontext binding to the reification node of the said triple.
(if any)
If you really like named graphs then .. the GRAPH construct simply becomes:
SELECT ?name ?mbox ?date WHERE
(?g dc:publisher ?name ?triplecontext)
(?g dc:date ?date )
(?triplecontext namedGraphs:belongsto
"http://example.com/mynamed.rdf")
Another way to expressi this syntactically could be with a
reificationnode(statement) function or a binding say
SELECT ?name ?mbox ?date WHERE
?A(?g dc:publisher ?name)
(?g dc:date ?date )
(?A namedGraphs:belongsto "http://example.com/mynamed.rdf")
nice? :-)
or with a function reificationnode(s o p) ..
My impression is that this would cut a large number of pages in the
specifications (all the construct specifically devoted to named graph)
AND allow the context models mentioned above. ..
(side node: yes .. so many triples.. but its just a factor say K .. and
this is just when serializing (assuming amore efficent serialization
cant be thought.. which is false) when inside a DB obvsiouly the context
would be coded in an efficent way)
While probably a good start, supporting a useful context construct
probabl requires more, which leads to the second point.
b) I see there is some support for something similar to the CBD. This
seems a very goode idea. CBD are bound to become very useful. But please
i suggest the support for very useful subset of the CBD that we call
MSG in [1], a Minimum Self Contained Graph. Basically is a blank node
closure on a given starting statement (not a node). a CBD is simply the
union of all the MSGs involving a starting URI (see also [3] for a
complete discussion) . MSG are important becouse of the decomposition
properties they have (See the paper for some theory) and becouse they
reprpesent the minimum information contribution that can be passed from
a peer to another in a distributed system. In our case we use the MSG
theory to support context information without the need for reification
of each statement, and in turn we use this context node (a reification
on any arbitrary triple of the msg) to provide a digital signature
INSIDE the rdf model so that the provenence of each statement can be
tracked without the need for named graphs.
Note that this has been said to be impossible in [2] "As discussed in
[X], it is necessary to keep
the graph that has been signed distinct from the signature, and other
metadata concerning the signing,
make about which information to trust.", where X is the Carroll
serialization paper... which doesnt make that claim (that i know of).
Anyway.. msg support could come in a way of selecting statements. which
would then require some operators to work with sets..
Checking a context in the model as highlighted by he paper would be
simple, given the abiliy to deal with statements set (a IN operator?)
SELECT ?name ?mbox ?date WHERE
?A(?g dc:publisher ?name)
(?g dc:date ?date )
where (?x namedGraphs:belongsto "http://example.com/mynamed.rdf")
IN msg(?A)
To conclude, i get the impression it would be benificial to clearly
define the support for named graphs in sparql extension.
RDF has been given resource centric APIs, statement centric, ontology
centric. its all ok, its all according to the consensus and the
reccomendations .
But making what is basically a "file centric" approach such a
fundamental part of the QL seems primitive at least? please someone
convince me of the contrary :-)
Thanks for the attention, please note that i am posting after reading a
few thread in the ML .. but certainly not all, please apologizes if i am
disregarding some major post, i'd be happy to know about.
Giovanni
[1] http://giovanni.ea.unian.it/temp/WWW2005_signignRDF.pdf
[2] Jeremy Carroll, Christian Bizer, Patrick Hayes, Patrick Stickler:
Named Graphs, Provenance and Trust
<../../bizer/pub/Carroll_etall-WWW2005.pdf> at The Fourteenth
International World Wide Web Conference (WWW2005), Chiba, Japan, May 2005.
[3] http://giovanni.ea.unian.it/temp/RDFGROWth_workshopISWC2004.pdf
Toward widely deployable Semantic Web P2P: tools, definitions and the
RDFGrowth algorithm
Giovanni Tummarello, Christian Morbidoni, Joakim Petersson, Francesco
Piazza, Mauro Mazzieri, Paolo Puliti
Received on Saturday, 19 February 2005 16:39:13 UTC