- From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
- Date: Mon, 20 Aug 2012 16:02:47 +0200
- To: RDF WG <public-rdf-wg@w3.org>
Dear all, ==Post scriptum:== Sorry for the long email. *In summary:* I describe 3 different families of datasets semantics, I argue that there are important use cases for each of them, I'd like that all semantics are standardised with a mechanism to describe what semantics is assumed when exchanging datasets. There are more arguments on this at the end if you want to skip the discussion on the semantics. ====End of PS===== I come back to the topic of formal semantics for RDF datasets. I can see that there are two issues that are almost orthogonal: 1. how the semantics of the triples inside the named graphs work. 2. how the graph "names" relate to the graph inside the (name,graph) pairs. To discuss this, I'll use the following example (do not bother the meaning of the classes and properties, I just try to make an example that looks a little realistic): # == EXAMPLE STARTS HERE == :year1960 dc:date "1960"^^xsd:gYear; :endorsed true . :year2000 dc:date "2000"^^xsd:gYear; :endorsed true . :year2012 dc:date "2012"^^xsd:gYear; :endorsed true . :myth :endorsed false . :year1960 { ex:MarilynMonroe a ex:LivingPerson . ex:LivingPerson owl:disjointWih ex:DeadPerson . } :year2000 { ex:MarilynMonroe a ex:DeadPerson . ex:DeadPerson owl:disjointWih ex:LivingPerson . } :year2012 { ex:MarilynMonroe a ex:DeceasedPerson . ex:DeceasedPerson owl:equivalentClass ex:DeadPerson . } :myth { ex:MarilynMonroe ex:livesIn ex:desertIsland . ex:livesIn rdfs:domain ex:LivingPerson . } # == EXAMPLE ENDS HERE == Wrt item 1 above, there are essentially 3 cases: a) The dataset simply is an RDF graph where the triples have been simply partitioned. An interpretation of that dataset is an interpretation of the graph made of all the triples found in all the named graphs and the default graph. Depending on what is decided about item 2 above, there can be additional semantic constraint wrt what the graph IRIs denote, but there could be no constraint either, so item 1 and 2 are essentially orthogonal issues in this case. Applications use the partitioning mechanism as they wish, e.g., for optimisation, for documentation... If such is the semantics of datasets, then the example is inconsistent, so it entails all possible datasets. b) The dataset is interpreted in the same way as an RDF graph, where the default graph must be true and the <name,graph> pairs are interpreted as assertions that relate the name to the graph itself. The actual relationship is to be determined, but what matters here is the syntax of the graph. It matters that the term ex:DeceasedPerson is used, not that the person denoted by ex:MarilynMonroe is dead. It is essentially the "quoting" semantics. The entailments depend on what is the relationship between the graph IRI and the graph, but a typical case is when the graph IRI denotes the graph, in which case, the example does not entail: :year2012 { ex:MarilynMonroe a ex:DeadPerson . } neither does it entail: :myth { ex:MarilynMonroe a ex:LivingPerson . } In this case, no conclusion are ever drawn from any assertion put inside a named graph. c) Each named graphs describe a world according to the graph IRI. In the example, the world according to :myth is that ex:MarilynMonroe is living somewhere. What matters is the truth of the assertions rather than the fact that the term "deceased" or "dead" was used. So one can draw the conclusion that: - *in :year1960*, ex:MarilynMonroe is not a ex:DeadPerson; - *in :year2012*, ex:MarilynMonroe is a ex:DeadPerson etc. In this case, the possibilities for what's the relationship between the graph IRI and the graph are more limited than in the other case. For instance, if the IRI must be intrepeted as the graph itself, then it prevents a lot of inferences. I can see use cases for each of these semantics. a- If one is managing data that are verified facts, then one would like that all of the triples are true. Yet, they still have reasons to split the data in different parts, allowing users to query them separately with SPARQL GRAPH keywords. b- for a Semweb search engin exchanging the dump of its crawl, it makes sense to have an accurate "quote" of has been crawled. c- for situation regarding temporal evolution of facts, integration of variously trusted sources, tracking provenance of inferred knowledge, etc... I find odd that semantics b is retained as the only valid one in the "RDF graph identification" proposal. It's sweeping away several Priority A use cases, with some of the Priority B too. Also, the condition ∀i: I(ui) = Gi is problematic. At first, it seems to be natural to say that the graph IRI RDF-denotes the graph. But: http://www.w3.org/2011/rdf-wg/meeting/2011-04-14#resolution_1 "RESOLVED: Named Graphs in SPARQL associate IRIs and graphs *but* they do not necessarily "name" graphs in the strict model-theoretic sense. A SPARQL Dataset does not establish graphs as referents of IRIs (relevant to ISSUE-30)". I know this resolution is about SPARQL datasets, and it's not necessarily applying to whatever structure we come up with in RDF, but one of the Priority A use cases is to be able to dump a SPARQL store. With this resolution, there is apparently a clash between the use case requirement and the semantic condition. My proposal is to define several recommended semantics and allow the concrete syntax to declare in a document what semantics is assumed when exchanging a dataset. I find this idea appealing because it is in line with the fact that information carried by HTTP is accompanied by a self description of how it should be understood. For instance, we have MIME types, we have <!DOCTYPE> declarations, etc. Since RDF is not a purely syntactical datastructure, it makes sense that it carries with it a reference to the semantics it uses. Such practices of referencing the MIME type, charset, doctype, schema, etc have been a key enabler of interoperability on the Web. Why not extend the pattern to the formal semantics? BTW, SPARQL services have a way to tell what inferrence regime they support, and SPARQL queries have a way to ask for a particular regime. I pretend that my proposal is simply in agreement with already accepted notions in the SPARQL world. Best, -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03 Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
Received on Monday, 20 August 2012 14:03:15 UTC