[Fwd: Re: Questions about BLD conditions and rule syntax and XML translation] from Hassan Aït-Kaci on 2008-04-10 (public-rif-wg@w3.org from April 2008)

Forwarded message 1

From: Hassan Aït-Kaci <hak@ilog.com>
Date: Wed, 09 Apr 2008 17:42:03 -0700
Subject: Re: Questions about BLD conditions and rule syntax and XML translation
To: "Boley, Harold" <Harold.Boley@nrc-cnrc.gc.ca>
CC: public-rif-wg-request@w3.org
Message-ID: <47FD625B.7070106@ilog.com>

Hi Harold,

I am copying my answer to your lates mail to the RIF/WG as per Chris
Welty's request during our last weekly meeting.

To all: This thread started between Harold and myself regarding the
tool that I m trying to finish to ease generating XML syntax from
presentation syntax. That would ease experimenting with examples
and keeping everything consistent as the XML vocabulary is evolving.

Boley, Harold wrote:
> Please send me this context-sensitive solution when ready.

The grammar and the parser are ready. The unambiguous grammar is
given (in Jacc format) in the two attached files BLR.grm, and BLC.grm.
Jacc generates a parser for it without problem. It works - I tested
it on all the BLD document examples.

In the grammar files, the rules that are commented out correspond to
those of your EBNFs of the BLD documents, and those that replace them
fix the ambiguities for an LALR(1) parser.

I am presently adapting Jacc's XML serializer to support the high-level
context-sensitive rule specification of XML syntax from the presentation
syntax. It will take me a couple of days before I can get this to work 
and feed all the rules from the BLD manuscript as they are so to speak,
to produce the Pres. Synt. (PS) XML serializer.

Note that the attached grammars are for the canonical (non-abridged)
PS. It would not be difficult to make it work for the abridged version
(just adapting the current tokenizer and adding a couple of grammar
rules).

Incidentally, I am not pushing to make the abridged PS (APS) a real
language (with infix logical connectives, if-then-else notation,
etc...) as was proposed by some during our last meeting. I agree with
MK and HB that the PS is just for that - presentation - of the
normative XML (AST-like) syntax. The real work would be for each ouf us
to map their own favorite's rule and condition languages to the XML
format of BLD, or variants thereof that we may define to suit their 
needs. Still, the APS is to make our (i.e., this WG's) life easier
while composing the whole setup (as well as for readers to follow the
examples after they get the gist of the issues). The APS would ease
writing test cases as well.

> We did BLD and FLD updates including:
> 
> Moved the FLD specialisation section to the end of BLD.
> 
> Renamed the <Group> use of the <formula> role with the new
> <sentence> role.
> 
> Cheers,
> Harold

Best,

-hak

> 
> -----Original Message-----
> From: Hassan Aït-Kaci [mailto:hak@ilog.com] 
> Sent: April 8, 2008 11:35 AM
> To: Boley, Harold
> Subject: Re: Questions about BLD conditions and rule syntax and XML translation
> 
> OK - I'm back from my daily swim ... :-)
> 
> Thanks for the comments/answers. Another thing that is on my
> slate to do is to extend Jacc's XML annotation notation to
> accommodate context-sensitive transforms of the kind you use
> in you translation rules - which are BTW rewrite rules over
> regular-expression trees (since you use the '...'). While
> swimming, I think I figured a relatively easy way to automate
> the process fully from the grammar. It's a matter of a couple
> of days' work - I hope. I'll keep you posted.
> 
> Cheers,
> 
> -hak
> 
> 
> Boley, Harold wrote:
>> Hi Hassan,
>>
>> Great that you now have such a parser.
>>
>>> One quick question: your grammar does not account for possibly
>>> giving a name to a "Group" or even an individual rule (as it is
>>> common in PR systems).
>> The name of a Group is the object of its "METAFRAME" (see below).
>>
>>> I know the "IriMeta" frame could be used for that, but such is
>>> not the case in most PR systems.
>> "IRIMETA", which abridges "IRIPLUSMETADATA", was meant to stress
>> that both aspects are combined into a frame, but I totally agree
>> that "METAFRAME" is better (note the ALLUPPERCASE for 'invisible'
>> non-terminals).
>>
>> OK, "UNICODE" isn't good. "RIFID", which many would misread as
>> "RFID", still connotes with some ID label attached to a thing,
>> not like that thing itself being the unicode sequence. I'm sure
>> you'll find the optimal word.
>>
>> Let's forget about Antlr's lookahead/backtrack since Jacc does
>> all we need.
>>
>> I'm sure your XML serializer for the presentation syntax will
>> again find bugs in my handwritten XML.
>>
>> Because you're at work that early, there seems to be hardly a
>> timezone difference between us, so please consider calling me
>> instead of typing for our future exchange.
>>
>> Best,
>> Harold
>>
>>
>> -----Original Message-----
>> From: Hassan Aït-Kaci [mailto:hak@ilog.com] 
>> Sent: April 8, 2008 8:43 AM
>> To: Boley, Harold
>> Subject: Re: Questions about BLD conditions and rule syntax and XML translation
>>
>> Boley, Harold wrote:
>>> Hi Hassan,
>>>
>>> Yes, we shall need to finalize our Phase 1 deliverables very soon.
>>> Please help us to converge fast.
>> I will do my best, Harold. I am mostly trying to understand.  Your
>> comments do help. Thanks for taking precious time to reply.
>>
>>> "IDENTIFIER" would be better than "UNICODESTRING" if there wouldn't
>>> already be IDs on other levels (such as for basic XML IDs). What
>>> about just "UNICODE"?
>> Hmmm... A Unicode is that of a character. It is really a Unicode
>> Identifier (but then again so are the contents of strings and
>> variables). What about "RIFID" - for RIF Identifier...?
>>
>> I also propose just "Meta" or "MetaFrame" rather than "IriMeta".
>>
>>> I recently mentioned changes, pointing you to the translation tables:
>>> http://www.w3.org/2005/rules/wiki/BLD#Translation_Between_the_RIF-BLD_Presentation_and_XML_Syntaxes
>>> We use "Group" because FLD has formulas more general than rules and
>>> because it's just a grouping construct. I will notify you as early
>>> as possible about future changes.
>> OK - I reverted to "Group".
>>
>> One quick question: your grammar does not account for possibly
>> giving a name to a "Group" or even an individual rule (as it is
>> common in PR systems). Why not allow for an optional such name?
>> I know the "IriMeta" frame could be used for that, but such is
>> not the case in most PR systems.
>>
>>>> But I must also find a way to get rid of that one too. I'll
>>>> keep trying by tweaking the grammar.
>> I managed to get it - a the price of a slight relaxation of the
>> grammar. I know can generate a deterministic BU parser (no guessing).
>>
>>> Can your tool do some (production-specific) wider look-ahead, as we
>>> used in ANTLR (for an earlier project) to find each slot's "->" infix
>>> in such terms?
>> Can Antlr perform an arbitrary amount of lookahead? (Because this
>> is what is needed for distinguishing between a positional or
>> attributed subterm... At any rate, Jacc can accommodate backtrack
>> moves so I could do arbitrary look ahead. The rule, however, in
>> designing grammars is always to keep ambiguities to a minimum.
>>
>>> Let me come back to our earlier exchange, slightly extended:
>>>
>>> Full presentation syntax       Abridged presentation syntax  Remark
>>>
>>> "foo:bar"^^rif:iri             <foo:bar>                     IETF's angular bracket notation
>>> purchase^^rif:local            purchase                      locality by default
>>>
>>> "a b c"^^xsd:string            "a b c"                       Full: quotes are part of ^ syntax
>>>
>>> "10"^^xsd:integer              10                            as in programming languages
>>> "1000000000"^^xsd:long         1000000000                    as in programming languages
>>> "3.14"^^xsd:decimal            3.14                          as in programming languages
>>>
>>>  . . .                          . . .
>> OK - makes sense. I will implement these today. I may also propose
>> other abridgements if any may arise.
>>
>>> Other XML communities should already have looked into this.
>>> We could see how they write all of the "xsd:" datatypes here
>>> (see also http://www.w3.org/2005/rules/wiki/DTB):
>>>
>>>     * rif:iri
>>>     * rif:local
>>>     * rif:text
>>>
>>>     * xsd:string
>>>
>>>     * xsd:integer
>>>     * xsd:long
>>>     * xsd:decimal
>>>
>>>     * xsd:time
>>>     * xsd:date
>>>     * xsd:dateTime
>>>
>>>     * rdf:XMLLiteral
>>>
>>> Best,
>>> Harold 
>> Thanks again. I will send you a first version of my XML serializer
>> for the presentation syntax probably later today (I still have to
>> enter half of the XML translation rules - but is is straightforward
>> from your document), and document it a little.
>>
>> Cheers,
>>
>> -hak
>>
>>> -----Original Message-----
>>> From: Hassan Aït-Kaci [mailto:hak@ilog.com] 
>>> Sent: April 7, 2008 12:48 PM
>>> To: Boley, Harold
>>> Subject: Re: Questions about BLD conditions and rule syntax and XML translation
>>>
>>> Hi Harold,
>>>
>>> Thanks for taking the time to help me catch up with the fine
>>> details of your BLD language design. Of course, I understand
>>> that the current presentation syntax grammar is a moving target
>>> and that things are likely to change as it will evolve. I also
>>> understand that you wish to first concentrate you efforts on the
>>> the XML syntax. At one point, however, we shall need to finalize
>>> it and freeze it.
>>>
>>> Thanks also for reposting the presentation syntax of the ruleset
>>> example in the BLD document. I noticed that you have now changed
>>> "RuleSet" to "Group" and "LITERAL" to "UNICODESTRING". While I
>>> agree that "LITERAL" is not appropriate, I would propose to use
>>> simply "IDENTIFIER" rather than "UNICODESTRING". I also would
>>> prefer "RuleSet" to "Group" because it describes exactly what
>>> it is: viz., a set of rules. (A "group" is a vaguer and more
>>> ambiguous term.)
>>>
>>> At any rate, when you do make such changes in the grammar and
>>> syntax (including lexical), please let me know as I now have
>>> to update several files to keep in sync with the BLD document
>>> as it evolves. Jacc allows easily to adapt to changes, but I
>>> need to know when they happen to keep my experimental system
>>> consistent with your document. I would greatly appreciate if
>>> you would so notify me (and explain why the changes). Thanks.
>>>
>>> Regarding the ambiguities in the grammar, I could transform the
>>> grammar to an equivalent one where only one remains (the second
>>> one between the two rules:
>>>
>>> 	UnitermBody -->  Terms*
>>> 	UnitermBody -->  TermAttributes*
>>>
>>> But I must also find a way to get rid of that one too. I'll
>>> keep trying by tweaking the grammar.
>>>
>>> Finally, the syntax used in the test cases (see attached mail)
>>> is more free that the one you describe (in particular for numbers
>>> and unquoted and non-qualified indentifiers). It looks like
>>> a lighter and more readable presentation syntax. So, whenever
>>> you find some time, I would like to give a shot with you (and
>>> others in the WG?) at defining a half-decent such syntax in
>>> order to make the presentation to your XML syntax less verbose
>>> and more easily readable (and writable!) by humans.
>>>
>>> Thoughts ?
>>>
>>> Bye for now,
>>>
>>> -hak
>>>
>>>
>>> Boley, Harold wrote:
>>>> Hi Hassan,
>>>>
>>>> Thanks for these experiments.
>>>>
>>>> Currently, the EBNF syntax is not considered a concrete syntax:
>>>>
>>>> http://www.w3.org/2005/rules/wiki/BLD#EBNF_Grammar_for_the_Presentation_Syntax_of_RIF-BLD
>>>>
>>>> However, as we discussed, I could imagine it becoming an input shorthand in some future wd.
>>>>
>>>> http://www.w3.org/2005/rules/wiki/BLD#Translation_of_RIF-BLD_Condition_Language
>>>>
>>>> Since the presentation syntax of RIF-BLD is context-sensitive, the translation must differentiate between terms that occur in the position of individuals (e.g., Expr) from terms that occur as atomic formulas (e.g., Atom).
>>>>
>>>> Terms of the form operator(...) denote expressions iff occurring
>>>> inside Atom, Equal, Member, Subclass, or Frame or inside expressions.
>>>> A top-down parser can hand this occurrence-context information down the formula tree.
>>>>
>>>> I focus on the XML syntax for now.
>>>> W3C's XSV is fine with the corresponding
>>>>
>>>> http://www.w3.org/2005/rules/wiki/BLD#Condition_Language
>>>> (to be updated on Monday)
>>>>
>>>>  <xs:element name="Atom">
>>>>    <xs:complexType>
>>>>      <xs:sequence>
>>>>        <xs:group ref="UNITERM"/>
>>>>      </xs:sequence>
>>>>    </xs:complexType>
>>>>  </xs:element>
>>>>
>>>>  <xs:element name="Expr">
>>>>    <xs:complexType>
>>>>      <xs:sequence>
>>>>        <xs:group ref="UNITERM"/>
>>>>      </xs:sequence>
>>>>    </xs:complexType>
>>>>  </xs:element>  
>>>>    
>>>>  <xs:group name="UNITERM">
>>>>    <xs:sequence>
>>>>      <xs:element ref="op"/>
>>>>      <xs:choice>
>>>>        <xs:element ref="arg" minOccurs="0" maxOccurs="unbounded"/>
>>>>        <xs:element ref="slot" minOccurs="0" maxOccurs="unbounded"/>
>>>>      </xs:choice>
>>>>    </xs:sequence>
>>>>  </xs:group>
>>>>
>>>> Again, please call me any time.
>>>>
>>>> Best,
>>>> Harold
>>>>
>>>> Phone: +1-506-444-0385
>>>> Skype: boleyh
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Hassan Aït-Kaci [mailto:hak@ilog.com] 
>>>> Sent: April 5, 2008 4:19 PM
>>>> To: Boley, Harold
>>>> Subject: Re: Questions about BLD conditions and rule syntax and XML translation
>>>>
>>>> Hi Harold,
>>>>
>>>> I entered the BLD condition language grammar from the BLD document into
>>>> Jacc and sure enough it complained of exactly the same thing I did with
>>>> you regarding a couple of inherent ambiguity in it. The first one come
>>>> from these two conflicting rules:
>>>>
>>>> 	Atom --> Uniterm
>>>> 	Expr --> Uniterm
>>>>
>>>> which create a Reduce/Reduce conflict in a bottom-up LALR(1) parser.
>>>> Another Reduce/Reduce ambiguity comes from a conlfict between these
>>>> two rules:
>>>>
>>>> 	UnitermBody -->  Terms*
>>>> 	UnitermBody -->  TermAttributes*
>>>>
>>>> This is what Jacc is saying on the Grammar (see files BLD.grm, BLR.grm,
>>>> and BLC.grm):
>>>>
>>>> 4J4ZN71 (hak) 125> make gen
>>>> *** Tidying up directory...
>>>> *** Generating the BLD parser...
>>>> *** Reading grammar in file BLD.grm ...
>>>> *** Including file BLD_doc.grm ...
>>>> *** Including file Keywords.grm ...
>>>> *** Including file ParserCode.grm ...
>>>> *** Defining XML namespace prefix: rif = "http://www.w3.org/2008/rif"
>>>> *** Setting XML root's namespace to rif
>>>> *** Setting XML root to rif:document
>>>> *** Including file BLR.grm ...
>>>> *** Including file BLC.grm ...
>>>> *** Starting grammar analysis ...
>>>> *** Grammar analysis completed in 63 ms.
>>>> *** Building parsing tables ...
>>>> *** WARNING: unresolved conflicts: 2 reduce/reduce
>>>> *** Writing parser file Parser.java
>>>> *** Parser generation completed in 62 ms.
>>>> *** Total processing time: 187 ms.
>>>>
>>>> This is annoying because this BLD grammar is really nothing more than
>>>> a simple AST syntax rather than that of a real language, and yet it is
>>>> inherently ambiguous!... As shown by the analysis performed by Jacc on
>>>> it, the grammar gives no way of knowing whether a functional term or a
>>>> relational predicate is being parsed. As well, there is no way in it
>>>> for determining whether a positional or attibuted UniTerm has been
>>>> recognized upon closing a ')'.
>>>>
>>>> Since nothing is yet cast in concrete concerning the BLD presentation
>>>> syntax, perhaps we should be better advised to come up with such a
>>>> presentation grammar that would be both easy to read and non-ambiguous?
>>>>
>>>> What do you think?
>>>>
>>>> -hak
>>
> 
> 


-- 
Hassan Aït-Kaci  *  ILOG, Inc. - Product Division R&D
http://koala.ilog.fr/wiki/bin/view/Main/HassanAitKaci

// FILE. . . . . /home/hak/ilt/src/ilog/rif/BLR.grm
// EDIT BY . . . Hassan Ait-Kaci
// ON MACHINE. . 4j4zn71
// STARTED ON. . Wed Apr 02 14:07:24 2008

// Last modified on Wed Apr 09 17:12:39 2008 by hak

/**
 * Basic Logic Dialect (BLD) rule language's root.
 */
RifDocument
  : Group
  { showXml(); }  // show the XML tree
  ;

////////////////////////////////////////////////////////////////////////
// The BLD Rule Language:
////////////////////////////////////////////////////////////////////////

Group
  : GROUP Meta_opt OPENPAR RuleSet_opt CLOSEPAR
  ;

Meta
  : Frame
  ;

Rule
  : Clause
  | FORALL Vars_opt OPENPAR Clause CLOSEPAR
  ;

Clause
  : Atomic
  | Implies
  ;

Implies
  : Atomic IF Formula
  ;

RuleSet_opt
  : /* empty */
  | RuleSet
  ;

RuleSet
  : RuleOrGroup
  | RuleSet RuleOrGroup
  ;

RuleOrGroup
  : Rule
  | Group
  ;

////////////////////////////////////////////////////////////////////////

Meta_opt
  : /* empty */
  | Meta
  ;

Vars_opt
  : /* empty */
  | Vars
  ;

////////////////////////////////////////////////////////////////////////

// FILE. . . . . /home/hak/ilt/src/ilog/rif/BLC.grm
// EDIT BY . . . Hassan Ait-Kaci
// ON MACHINE. . 4j4zn71
// STARTED ON. . Wed Apr 02 14:08:56 2008

// Last modified on Tue Apr 08 05:01:07 2008 by hak

////////////////////////////////////////////////////////////////////////
// The BLD Condition Language:
////////////////////////////////////////////////////////////////////////

Formula
  : Atomic
  | AND OPENPAR Formulas_opt CLOSEPAR
  | OR OPENPAR Formulas_opt CLOSEPAR
  | EXISTS Vars OPENPAR Formula CLOSEPAR
  | EXTERNAL OPENPAR UniTerm CLOSEPAR   //  | EXTERNAL OPENPAR Atom CLOSEPAR
  ;

Atomic
  : UniTerm // Atom
  | Equal
  | Member
  | Subclass
  | Frame
  ;

// Atom
//   : UniTerm
//   ;

UniTerm
  : Const OPENPAR UniTermBody CLOSEPAR
  ;

Equal
  : Term EQUAL Term
  ;

Member
  : Term MEMBER Term
  [ L:"Member" C:(1 3) ]
  ;

Subclass
  : Term SUBCLASS Term
  [ L:"Subclass" C:(1 3) ]
  ;

Frame
  : Term OPENBRA FrameAttributes_opt CLOSEBRA
  ;

Term
  : Const
  | Var
  | UniTerm  //  | Expr
  | EXTERNAL OPENPAR UniTerm CLOSEPAR // EXTERNAL OPENPAR Expr CLOSEPAR
  ;

// Expr
//   : UniTerm
//   ;

Const
  : STRING LEXSPACE SymSpace
  ;

Var
  : VARIABLE
  ;

////////////////////////////////////////////////////////////////////////

// UniTermBody
//   : Terms_opt
//   | TermAttributes_opt
//   ;

// Terms_opt
//   : /* empty */
//   | Terms_opt Term
//   ;

// TermAttributes_opt
//   : /* empty */
//   | TermAttributes
//   ;

// TermAttributes
//   : TermAttribute
//   | TermAttributes TermAttribute
//   ;

UniTermBody
  : SubTerms_opt
  ;

SubTerms_opt
  : /* empty */
  | SubTerms
  ;

SubTerms
  : SubTerm
  | SubTerms SubTerm
  ;

SubTerm
  : Term
  | TermAttribute
  ;

////////////////////////////////////////////////////////////////////////

TermAttribute
  : Const ARROW Term
  ;

FrameAttributes_opt  
  : /* empty */
  | FrameAttributes
  ;

FrameAttributes
  : FrameAttribute
  | FrameAttributes FrameAttribute
  ;

FrameAttribute
  : Term ARROW Term
  ;

Formulas_opt
  : /* empty */
  | Formulas
  ;

Formulas
  : Formula
  | Formulas Formula
  ;

Vars
  : Var
  | Vars Var
  ;

SymSpace
  : IDENTIFIER COLON IDENTIFIER
  ;  

////////////////////////////////////////////////////////////////////////