OWL 2 SS&FSS spec. - overly overloaded grammar/object terminology

Regarding the OWL 2 Structural Specification and Functional-Style Syntax
specification at http://www.w3.org/TR/2009/CR-owl2-syntax-20090611/:



Summary:
1) The descriptions mapping syntactic elements to structural objects
    are ambiguous, using the same term to refer to different things.
    They should be more precise.
2) Some of that ambiguity is caused by the poor names of some
    non-terminals.  Those non-terminals should be renamed.


Section 8.1.1 says (with subscripts transcribed using parentheses):

   An intersection class expression ObjectIntersectionOf( CE(1)
   ... CE(n) ) contains all individuals that are instances of
   all class expressions CE(i) for 1 ≤ i ≤ n.

That wording conflates the syntactic elements and the objects that
those syntactic elements denote in multiple ways:

- In the first part of the sentence, CE(i) refers to class
   expressions strings matching the ClassExpression production), but
   after "instances of," CE(i) instead refers to the classes
   described or denoted by those class expressions.

- Similarly, in the very first part of the sentence, "class
   expression" refers to class expressions, but after "instance of,"
   it refers to classes.

- Even more confusingly, the first occurrence of "class expression"
   is used in _both_ ways:  As the subject of the sentence it refers
   to a class expression (the intersection class expression), but
   the verb ("contains") treats it as instead referring to the class
   described by the class expression.  (A class contains
   individuals; a class expression (a string) only contains other
   syntactic objects (e.g., the nested class expressions CE(i)).)

The specification should be considerably more precise than that.


Shouldn't section 8.1.1 say something like this (added words
highlighted)?:

   An intersection class expression ObjectIntersectionOf( CE(1)
   ... CE(n) ) _specifies_the_class_that contains all
   individuals that are instances of all _classes_specified_by_the_
   class expressions CE(i) for 1 ≤ i ≤ n.

Yes, that might sound a little wordy, but the additional--and more
_precise_--words make it much easier to understand what is being
specified.


Note that some other cases are even less clear.

Consider section 8.1.4., which says (again, transcribed using
parentheses):

   An enumeration of individuals ObjectOneOf( a(1) ... a(n) )
   contains exactly the individuals a(i) with 1 ≤ i ≤ n.

Notice how, to get the intended meaning, the reader has to
interpret the first occurrence of the word "individuals" as
referring to the strings matching the Individual production, but
then interpret the second occurrence as referring to the
individuals identified by those strings.

Otherwise, one can easily read that as saying, tautologically,
that an enumeration containing individuals a(1) ... a(n) contains
the individuals a(1) ... a(n).


The specification should not re-use terminology ambiguously like
that.   It should use different phrases to refer to the syntactic
elements vs. to the things denoted by the syntactic elements.
This is especially true because one of main things this OWL
specification is trying to specify is the mapping between the
syntax and its meaning (at the structural level).


This problem exists partly because some of the productions
(non-terminals) have names that don't really reflect what they
are.

For reference, the non-terminal name "ClassExpression" seems to be
a good name:
- An expression is a syntactic construct.  A string matching the
   non-terminal ClassExpression is indeed an expression.
- Using the words in the non-terminal names as a phrase in English
   ("class expression") naturally refers to strings matching that
   ClassExpression production.
- The phrase "class expression," as used to refer to a string
   matching the ClassExpression non-terminal), is clearly different
   from the phrase "class," as used to refer to the thing (the class)
   denoted or described by a class expression.

However, the non-terminal names "Class" and "Individual" are poor
names--they easily lead to confusion.

Consider "Class":
- A class is not a syntactic construct.  A string matching the
   non-terminal Class is _not_ a class; it is an identifier (a form
   of IRI) that _denotes_ a class.
- Using the word in the non-terminal name as a phrase in English
   ("class") does _not_ naturally refer (only) to _strings_ matching
   that Class production--it also refers the _classes_denoted_by_
   those strings.
- The phrase "class," as used to refer to a string matching the
   Class non-terminal, is _not_ clearly distinguished from the
   phrase "class" as used to refer to the thing (the class) denoted
   by a ... um ... "class" in the non-terminal sense.

Something like "ClassIRI" or "ClassIdentifier" would be a much
better name for the non-terminal currently named "Class."

Recall the Class production:

   Class := IRI

and consider it renamed to ClassIRI or ClassIdentifier.

In particular, note how it makes a lot more sense to say:

    A class IRI is an IRI that denotes a class.

or:

    A class identifier is an IRI that denotes a class.

rather than the nonsensical:

    A class is an IRI that denotes a class.


(Careful readers might note that those example statements expose
the fact that the non-terminal name "IRI" has a similar problem:

An IRI is a string and therefore can be a piece of syntax, so
having a non-terminal named "IRI" isn't necessarily a problem.

However, strings matching the OWL non-terminal "IRI" are _not_
IRIs!  (They are angle-bracket-bracketed IRIs or abbreviations
that specify IRIs.)  Stop the madness!)

The name "Individual" is mostly parallel to "Class," except that
it does not include only IRIs (so "IndividualIRI" is not a
candidate new name).


So ...

1.  The wording specifying the meaning of the syntax (the
     correspondence between the syntactic elements and the objects
     represented by them) should be made more precise, perhaps
     using the pattern shown above.

2.  The names of non-terminals should be reviewed and those
     that lead to the ambiguity described above should be renamed
     appropriately.

     (An alternative _might_ be to just reword the textual
     references to the non-terminals to remove the ambiguity, but
     that would likely make things excessively verbose (e.g., "string
     matching the non-terminal X").)


Daniel
-- 
(Plain text sometimes corrupted to HTML "courtesy" of Microsoft Exchange.) [F]

Received on Wednesday, 9 September 2009 18:27:23 UTC