Review of the Ent. Reg. Document by Markus from Birte Glimm on 2011-11-21 (public-rdf-dawg@w3.org from October to December 2011)

From: Birte Glimm <birte.glimm@uni-ulm.de>
Date: Mon, 21 Nov 2011 22:14:42 +0100
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <CABt65OeoWz_5cEQ=rRdguTJsoYRoM=2gY=ZkunJgxAAXjJhOsw@mail.gmail.com>
Hi all,

I attach the review from Markus below. I'll try and address most
comments tomorrow.

Birte


The Entailment Regimes draft is in a very good shape. The explanations
are helpful and detailed, layout and notation is used consistently,
the structure is clear and systematic. Overall, I think this is a very
clear and helpful document, also due to the many informative
explanations. I have found only a few technical issues that may
require some further clarification or minor correction. There are also
a small number of (very) minor editorial issues that I am listing at
the end.

*** Inconsistency handling in RDFS

I was puzzled by Section 4.1.1 where it is suggested that tools could
avoid inconsistency checks (or do them lazily). The example is the one
with the 10002 triples and the join in the query. Even if a tool would
not report an inconsistency error, the inconsistency would still
affect the semantics since all statements would now be entailed
(restricted to the finitely many statements allowed by the
conditions). So the join optimisation in the example seems to lead to
incomplete results. I suppose incomplete answers are acceptable for
conformant SPARQL implementations, but incomplete BGP matching could
also lead to unsound query results for the overall query. So it does
not seem to be advisable for a SPARQL implementation of the RDFS
entailment regime to not check for inconsistencies, even if it is not
required to report an error (which still has some advantages, as the
later example with the HTTP-based query services illustrates).

*** Primitive datatypes

Section 5 explains what a datatype consists of and how literal values
are assigned to a canonical form. However, the specification of the
entailment regime (and all later regimes) then speaks of "primitive
datatypes from which [some other datatype] is derived". The general
datatype mechanism of RDF does not have a notion of "derived"
datatypes and it is not clear what this means here. The goal of this
formulation was to avoid the same value being reported as, e.g.,
"1"^^xsd:byte and "1"^^xsd:int. Since RDF graph equivalence does not
take datatypes into account, condition 4 of entailment regimes forces
a form of normalisation. The difficulty is that canonical forms are
defined for value when considered as a value of a particular datatype,
but there is no mechanism to obtain a canonical datatype for a value
(which may belong to many, possibly incomparable datatypes). I suggest
to solve this by introducing the idea of such a canonical datatype
upfront, at the place where the canonical values are now discussed. It
can then be explained that the canonical types in XML Schema are the
primitive types.

With canonical datatypes for values and canonical lexical forms for
values+datatypes, one can then define a canonical literal for each
value.

*** OWL RL URI

Section 6.4 specifies a URI to be used in the service description by
tools that support OWL RL. But Section 7.5.3 states that this URI
"can" be used by "Endpoints that use the OWL 2 Direct Semantics
entailment regime and that support the OWL 2 RL profile". It should be
clarified if the use of this URI says anything about the semantics or
not. If yes, then every profile URI would be needed in both semantics.
If no, then the sentence in 7.5.3 should be changed to avoid this
impression (and maybe another remark could be added how these URIs
relate to the entailment regime). In general, I wondered what exactly
these URIs mean, especially if it informs about the maximal supported
fragment ("we support nothing that is not RL") or the minimal
supported one ("we support at least all of RL").

*** Declarations for Direct Semantics

>From Sections 7.1 and 7.2, I did not fully understand where
declarations can be given in a query. After reading the Appendix, I
believe that they can be in the ontology that the active graph
represents and in individual BGPs but not in imported ontologies and
not in outer graph patterns. In particular, a query with a UNION needs
to have declarations in each part, they cannot be given at a higher
level of the query. Also, variables can have different declarations in
different BGPs. This might be worth a remark.

*** Variables in Literal Positions

The remarks of Section 7.3.2 seem to apply to OWL RDF-Based Semantics
just as well. This seems to be worth a remark (maybe the section
should even be moved to the RDF-Based Semantics section since this is
earlier in the document).

*** Finite Answers in RIF

It was not completely clear to me at first where the finiteness of
results is coming from in Section 8.1. The proof in Appendix C claims
that all regimes require that bindings are only taken from the
vocabulary defined for this regime, but this is not really done (or
needed) for RIF. There should probably be an according remark in
Appendix C.


Minor editorial issues:

* Introduction: ", or what kinds of errors can arise" -> ", and what
kinds of errors can arise"
* "aded" -> "added"
* Sentences should not start with abbreviations, the spelled out forms
should be used instead (cases that occur in the document are "E.g."
and "I.e.").
* The use of "a" to abbreviate "rdf:type" in the document is probably
not helpful for a reader. The definitions and normative discussions
require frequent use of "rdf:type" and there are many other rdf(s) and
owl terms that have no such abbreviation. Writing one of them as "a"
in some cases (but not in all) only introduces a source of confusion.
A short remark about "a" being syntactic sugar might be useful, but I
would eliminate it from the examples.
* "imaginary IRI"; probably not the right word; maybe "exemplary"
would be better; or maybe just fix the meaning to this concrete IRI
right away
* Use hyphens for prefixes consistently: "re-naming" vs. "recaptured"
(there might be other uses of "re-"; I guess "sub-" is another
candidate to check)
* Figure 1: the colour coding (RDF special terms vs. RDFS special
terms) does not work in monochrome printout
* "Semantic Web" is not capitalised consistently
* "Similarly, for OWL 2 DL entailment" should say "Direct Semantics"
rather than "DL"
* "Further explanation are"
* "The term rdfV refers to ..." There and in all similar places, the
word "term" is used in adjacent sentences to refer to RDF terms and to
the denomination of a meta-level concept (a set of terms). A possible
source of confusion.
* "to a large extend" -> "extent"
* Figure 2, caption should say "RDF graphs" (plural)
* Notation of variables. As far as I know, SPARQL uses the syntactic
forms "?x" and "$x" to denote the variable "x", i.e., the variable is
"x" and not "?x". This is correctly implemented in most result tables
but not in Section 8.4.2.2 and (all of) Section 9, where "?" appears
in table headers. Moreover, Section 3.2 applies some solution mappings
mu to "?x" rather than to "x" (but it is also correct in some places
in that section). I did not notice it elsewhere but it might be worth
checking.
* The Editorial Note in Section 3.2 states that "ex:a ex:b ?x would
have no solutions at all". This does not seem to be the right example
since the related triple in the example graph does not use the rdf:_n
entities that the note talks about.
* "Some triples that are well-formed for OWL 2 DL, are" -> remove ","
* Section 7.1.3, example 1 offers an interpretation of the query in
terms of declarations. But since declarations cannot be queried in
this entailment regime in any case this might be a misleading example
(offering a possible interpretation for the query that no Direct
Semantic query can ever have, even if ambiguous).
* "Higher Order Queries" vs. "First-Order Semantics". Probably have a
hyphen in both cases.
* Section 7.5, introduction, speaks about EL and QL only but the
section covers all three profiles.
* Section 8: "more on this in 7.4" and "see 7.4" should both use "8.4".
* Last sentence before Section 8.2: uses hyphens as dashes around
"i.e. ..." Probably use commas or &ndash;
* Same sentence, the "(1) - (3)" has spaces (other similar constructs
do not have this); could also be &ndash; but this is really minor.
* most uses of "cf." should probably be "see" or "see also" (when
strictly adhering to common style guides)



-- 
Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
University of Ulm                         Fax:   +49 731 50 24188
D-89069 Ulm                               birte.glimm@uni-ulm.de
Germany
Received on Monday, 21 November 2011 21:15:23 UTC