Re: comments on current version BLD document: symbols, datatypes, semantics

<snip/>

(I am writing this e-mail in the train (the noise cancellation of my
microphone works surprisingly well :), so I could not have look at the
latest updates)

>> Some editorial comments and suggestions for improving readability:
>> - Section "symbols and symbol spaces":
>> 1- I'm not entirely comfortable with the structure.  This section
>> already mentions connectives and quantifiers (in the first paragraph),
>> but does not define formulas (conditions). Then, the section
>> "presentation syntax" defines formulas.  I would propose to make the
>> following changes: remove the references to the connectives and
>> quantifiers in the symbols section, move the definition of "well-formed
>> condition formulas to the next section", and rename the "presentation
>> syntax" section it something like "Conditions", because this is in fact
>> the sections in which conditions are defined (using the presentation
>> syntax).
> 
> Hi Jos,
> 
> Thanks for the comments.
> 
> I implemented some of your suggestions, but not all. It is not correct to
> say that the presentation syntax is defined in the section called
> "Presentation Syntax". I changed its name to "EBNF for ..."
> 
> The presentation syntax is defined earlier. The section you are referring
> to only gives an EBNF.  In fact, one cannot even define the syntax properly
> using the EBNF. 

OK

> 
> Part of the problem was that the section names could be chosen better.
> I tried to fix them in the current version.
> 
>> 2- regarding the structure of this particular section: constants are
>> introduced in the first paragraph, and (a couple of pages) later
>> re-defined as pairs of literals and symbol spaces.
> 
> Constants are *not* "redefined" anywhere.
> In the beginning they are just referred to as elements of Const, but there
> is no redefinition.

They are not strictly redefined, but the definition is refined later.

> 
>> I think this
>> duplication (off definition) is unnecessary and confusing.  I would
>> propose to do the following: move the definition of constant symbols as
>> pairs (including symbol spaces) to the front, before the definition of
>> terms, and create a new subsection "Constant symbols" (or similar) for
>> these definitions. 
> 
> I do not like that idea because the definition of the symbol spaces and
> constants is quite long and the reader does not get to read the juicy stuff
> until they struggle through 1 page of definitions. 

The reader can always skip the section about symbol spaces and constants.

> 
> Instead, the current organization refers to constants as abstract entities
> without elaborating until it is necessary. This takes the reader straights
> to the important definitions.

I think it might actually be harder to grasp if they are On such an
abstract level for such a long time.
Anyway, that's just my feeling.  Let's see whether there are any other
comments regarding readability of the definition of constants.

> 
>> I would also propose to create a subsection for the
>> signatures and well-formed terms and formulas.
>> I think that the document would become a little easier to read.
> 
> done.
> 
>> 3- for readability, I think that, when introducing variables, their
>> (presentation) syntax should also be given
> 
> Where exactly do you see an unreadable piece of text due to the fact that
> the presentation syntax for variables is not given early enough?

I just had a feeling that it might be harder to grasp the definitions
because of the abstract level.

> But anyway, I added an appropriate piece of text.

OK

> 
>> 4- "symbol spaces", first paragraph: "long" -> "string" (since long is
>> not such a common datatype; at least not as common as string)
>> 5- "symbol spaces", fourth paragraph: it is not entirely clear what is
>> meant with "RIF mandates the following symbol spaces".  I would propose
>> to rephrase it to something like "RIF-compliant implementations must
>> support the following symbol spaces".  Furthermore, I would propose to
>> move this sentence (about supporting symbol spaces) to after the
>> introduction of the individual symbol spaces; I believe this will
>> improve readability.
> 
> ok
> 
>> 6- section "symbol spaces" second paragraph, first sentence: symbol
>> spaces are actually not subsets of the constant symbols in RIF.  First
>> of all, a symbol space is not a set; second, constant symbols are pairs
>> (literal, symbol space IRI). I would propose to rephrase this sentence
>> to something like "every constant symbol in the RIF as an associate
>> symbol space"
> 
> They are named subsets. (I added "named" to their abstract definition).

they are not subsets, because symbol spaces are pairs of lexical spaces
and identifiers

> 
>> 7- section "symbol spaces": for readability, I would propose to move
>> forward the definition of the syntax of RIF constants, to before the
>> definition of symbol spaces; this would give the reader a better idea of
>> the purpose of the symbol spaces.
> 
> ok
> 
>> 8- section "symbol spaces": I think "data types" (including value space
>> and lexical-to-value mapping) should be defined in this section, and the
>> distinction with symbol spaces which do not correspond to datatypes
>> should be made clear.  I believe that many people who read this document
>> will be aware of a notion of datatypes, and from the text which is
>> presented people might easily assume that symbol spaces and datatypes
>> aren't the same thing.
> 
> The notion of a data type is a combo of syntax and semantics. I decided to
> not introduce data types until we get to talk about the semantics.
> I find this cleaner.

okay, let's see whether other people have an opinion about which is cleaner.

> 
> 
>> - section "model theory for condition language of RIF BLD":
>> 9- I find it rather confusing to talk about value spaces and
>> lexical-to-value mappings of symbol spaces, where these actually only
>> apply to datatypes. I would propose to only talk about the effect of
>> datatypes.
> 
> A data type is a special case of a symbol space. I fail to see what you
> want here. The effect of the symbol spaces is not limited to data types
> alone, as is explained in that section. (You proposed to remove this
> explanation, but I clarified it instead.)

I will read your clarification, and will probably have some more
comments :-)

> 
> 
>> Comments about the language definition:
>>
>> - Section "symbols and symbol spaces":
>>
>> 1- section "signatures": there is currently only an informal description
>> of what partial order between signatures means.  There should be a
>> formal definition.
> 
> A partial order is a partial order. What exactly do you want to define here.
> There are no missing definitions in that section as far as I can see.

It was initially not clear what the implication of this partial order
is.  You give an informal explanation, but no formal definition.
However, I see now that it follows from the definitions of coherence and
well formed terms.  So, the text is fine as it is, I think

> 
>> 2- section "signatures and a condition language of RIF^BLD^": the
>> definition of equality atoms is not entirely clear: the symbol = is not
>> a constant symbol in RIF, according to the syntax definition in section
>> "presentation syntax" (it does not have the symbol space).  Furthermore,
>> as it is correctly mentioned that equality is not a built-in predicates,
>> I feel there is an impedance mismatch between this predicate symbol and
>> all other kinds of predicate symbols.  Finally, equality is currently
>> not mentioned when atomic formulas are initially defined.  Therefore, I
>> would propose to define equality atoms a=b directly when first defining
>> atomic formulas.
> 
> A good point! You were looking at a version before I moved = further down,
> but the point about its symbol space is well-taken. It should be either
> rif:local (my pref) or rif:iri.
> 
> I experimented with requiring all symbols to have explicit symbol spaces in
> the examples, but I think they become unsightly due to that. Perhaps we
> should not require ^^rif:local explicitly. Then we could write a=b as
> before.  If people think that we should insist on explicit symbol space
> names (as it is done now, to make the syntax look more abstract and devoid
> of syntactic sugar) then I am fine with writing a =^^rif:local b or even
> equal^^rif:local(a,b).

This does not yet address my concerns about the impedance mismatch
between the equality symbol on the one hand, and predicate symbols on
the other: Predicate symbols are interpreted as sets of tuples, whereas
equality is interpreted as identity.

> 
>> 3- section "symbol spaces": why is the lexical space of the symbol space
>> non-empty?  Why not simply define it as a set?  I think it does not hurt
>> to allow empty lexical spaces.
> 
> A symbol space defines a set of constants. If the lexical space is empty,
> the set of constants is empty. Does not make sense to contemplate such sets
> of symbols in the syntax, IMO.

Well, I can easily think of cases where one would want an empty symbol
space. I can, for example, easily imagine rule sets which do not use any
local identifiers, so the lexical space of the rif:local symbol space
will be empty.
furthermore, the definitions just become more complicated by requiring
the sets to be nonempty.


> 
>> 4- section "symbol spaces": there should be a restriction on the kinds
>> of datatypes which may be used.  Namely, the lexical-to-value mapping
>> and the value space need to be well-defined;
> 
> what do you mean by that?

Say, you are using a datatype with an ill-defined value space, e.g.
xsd:duration.
If you have a symbol "xx"^^xsd:duration, the value
IC("xx"^^xsd:duration) is not well defined, so the interpretation is not
well defined.

> 
>> otherwise, the RIF
>> semantics is not well-defined. The text in the paragraph "symbols with
>> ill-formed lexical parts" will then need to be updated accordingly.
>> 5- "symbols with undefined symbol spaces": the description in this
>> paragraph is incorrect, since, if the symbol space corresponds to a
>> known datatype, and implementation will interpret the symbol according
>> to this datatype.  I propose to remove this paragraph.
> 
> You were the one who proposed this paragraph in the first place. 

That was when the datatype support in the RIF was not yet extensible.
Now it is, so the paragraph is out of date.

> Second, I
> do not really understand what you are saying.  The paragraph you are
> referring to talks about "undefined" symbol spaces, while in the above you
> are talking about the known spaces.

yes, the terminology is a bit confusing.  RIF "defines" a number of
symbols spaces, but people can use additional symbol spaces which
correspond to /known/ datatypes.
So, we have a number of symbol spaces defined by the RIF; of these
symbol spaces, those which corresponds to datatypes, correspond to
/known/ datatypes (by definition). There are a number of other symbol
spaces which corresponds to datatypes.  Some of these will correspond to
known datatypes, and some will corresponds to unknown data types.
I interpreted an "undefined" symbol space as a symbol space which is not
"defined" by RIF.

By the way, it is exactly this confusion which led me to propose to use
datatype maps.  Interpretations will be defined with respect to a
datatype maps, which includes all the /known/ datatypes.

> 
>> 6- I see the following problems with the mapping of constant symbols:
>>    a- it is hard to grasp from the definition how a single constant
>> symbol is interpreted; it is necessary to carefully read and try to
>> understand all the (too lengthy) text. The definitions can be much
>> crisper, as I showed in earlier proposals for this definition.
> 
> I do not think that the text is either lengthy or that it is not crisp enough.
> I believe that this is similar to your earlier proposals. If you have a
> specific text to propose, please do so.

OK, I will send a proposal in a separate e-mail.

<snip/>

> 
>>    d- no distinction is being made between the identifier of a symbol
>> space and the symbol space itself, whereas they are different things
> 
> Where did you find that?

For example, you talk about things like "the symbol space rif:text".
However, rif:text is the identifier of the symbol space, and not the
symbol space itself.

> 
>>    e- according to the second bullet in "the effect of the symbol
>> spaces", the mapping IC(lit^^symsp) should be defined for every constant
>> symbol of the form lit^^symsp with lit in the lexical space of the
>> symbol space identified by symsp.  This would mean that every such
>> symbol is in the vocabulary of every RIF language. I think this is
>> highly undesirable (and should have been mentioned in the syntax
>> section). The mapping should only be defined for symbols in the vocabulary.
> 
> We are defining a logic, including some specific sets of symbols that
> belong to the language of that logic.

Yes, but it is not necessary.  Why not let the user decide exactly which
of the symbols to use?

> 
> Your claim that this is "highly undesirable" requires at least some explanation.

OK, because it is syntax my claim is probably a bit too strong.  The
thing is that it is simply unnecessary to include all these constants in
the vocabulary of every language, so why would we do it?

An additional drawback is that if we were to consider a dialect which
has a semantics based on Herbrand universes, all these symbols are
included in the universe. This would have certain undesirable
implications, e.g. ( considering a dialect with negation):

q(a)

s(x) :- naf q(x)

entails

s("1"^^xsd:integer)
s("1"^^xsd:string)
s("2"^^xsd:integer)
s("2"^^xsd:string)
s("3"^^xsd:integer)
s("3"^^xsd:string)
.....


> I fail to see what exactly is missing in the definition of the syntax.

from the definition of the syntax it is not clear that the vocabulary
includes all these symbols. I think that if this is the case, and the
reader should be notified in the syntax section.

> 
>>    f- the sentence immediately following the bullets is confusing.  I
>> would propose to remove it.
> 
> I do not agree that it should be removed. Clarified it instead.

okay I'll have a look

> 
>>    g- the value space is required to be a subset of the domain.  This
>> means that every interpretation includes all value spaces of all data
>> types.  This is unnecessary.
> 
> So what? It makes the definition simple and uniform.

It makes every domain infinite. For most kinds of rules (especially
those without equality in the head) this is not really a problem.
However, as soon as we have full use of equality, or deal with
extensions in the direction of FOL, then one often wants to talk about
finite models.


It also makes rule sets which only contain rules such as Forall ?x,?y
(?x=?y)  inconsistent. I claim that this is undesirable.



On a side note, some extensions (and possibly a combination with OWL DL)
will want to separate the (concrete) interpretation domain for datatypes
from the individual (abstract) interpretation domain in order to be able
to do effective reasoning.
I believe we currently do not have such a mechanism (to separate the
two), do we?


Best, Jos

-- 
                         debruijn@inf.unibz.it

Jos de Bruijn,        http://www.debruijn.net/
----------------------------------------------
In heaven all the interesting people are
missing.
  - Friedrich Nietzsche

Received on Wednesday, 10 October 2007 17:33:20 UTC