Re: XML schema with recursive SimpleType from C. M. Sperberg-McQueen on 2009-05-30 (xmlschema-dev@w3.org from May 2009)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Sat, 30 May 2009 13:10:39 -0600
To: Sucheta Phatak <sphatak@cs.iupui.edu>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, <xmlschema-dev@w3.org>
Message-Id: <9BA9AA48-31F6-4603-A41E-9DC818ACF9B0@blackmesatech.com>
On 29 May 2009, at 06:08 , Sucheta Phatak wrote:

> Hello All,
>
> I am trying to create a XML Schema which contains an element called  
> as “Pre-Condition” which defines a logical expression.
> For this I need to define a type “LogicalExpressionType”. This type  
> will look something like,
>
> <xs:LogicalExpressionType>
>                 <union memberTypes=”InfixExpressionType  
> PrefixExpressionType”/>
> </ xs:LogicalExpressionType>
>
> <xs: InfixExpressionType>
>                 A sequence of “LogicalExpressionType” “+|-|/|….”  
> “LogicalExpressionType”
> <xs: InfixExpressionType>
>
> Is there a way of defining such type, so as to ensure a valid pre- 
> condition value is specified in an XML. This is recursive  
> simpleType, is it allowed?
> Is there a better way of doing it?

If I have understood you correctly, you are wanting to define
a simple type for expressions in some logical notation.  So
an instance of your vocabulary might contain

   <PreCondition>x and y iff z or not b</PreCondition>

-- but with your preferred logical notation, not the one used
above.  (If that's not what you are trying to do, you may
ignore the rest of this mail.)

In the unlikely event that the logical notation you have in
mind has a syntax that can be defined by a regular grammar,
then in principle it would be possible to write a regular
expression which accepts all syntactically legal logical
expressions and no expressions with syntactic errors.  This
is unlikely because all of the logical notations in common
use require context-free grammars.  Your description of the
expression type as recursive suggests that the language you
want to define is in fact context-free.  Does it allow
parentheses?

You have, I think, two options.

(1) You can decide to define not a simple type, whose values
will have a surface syntax for the logical expressions, but
a complex type, such that the element structure will
provide the equivalent of an abstract syntax tree for your
language.  In a design of this sort, the example given
above might read

   <precondition>
     <iff>
       <and> <var>x</var> <var>y</var> </and>
       <or> <var>z</var> <not><var>b</var></not> </or>
     </iff>
   </precondition>

Those who write code to process preconditions will bless you
if you provide a convenient syntax for them -- they don't
need to write a parser for logical expressions but can
concentrate on evaluating them -- and those who write
documents will probably want special-purpose editors for
editing preconditions.

(2) You can define a simple type that approximates the syntax
of your logical notation but does not capture it fully.  By
setting a maximum nesting level for logical expressions you can
define a regular language which is a strict subset of the
notation you really care about, so any valid literal is a
syntactically legal logical expression.  Or by relaxing the
context-free constraints you can define a regular language
which is a strict superset of your notation, so that any
syntactically legal logical expression will be valid against
your simple type.

Or you can define both, and then create a union datatype which
includes both as members, so the literal in the instance will
be recognized as a member of the subset language, if possible
(in which case you know it's syntactically valid), and otherwise
will be tested for membership in the superset language.  If
your validator exposes the relevant part of the PSVI, you will
know which member datatype was matched, and thus know whether
the literal is guaranteed to be syntactically legal in your
logical notation or not.

Since any application that uses the preconditions will also
need to check them for semantic correctness and plausibility
as well as syntactic correctness, you will presumably have an
opportunity later to check the input with a full and correct
grammar of the logical notation.  The subset and superset
languages for which you define simple types are essentially
aids to help you catch some errors earlier in the process.

Creating regular languages which are subsets or supersets of
a given context-free language is a relatively straightforward
task for people with an interest in formal languages (though
I suspect it could quickly become tedious); in principle,
it's clearly automatable, but I don't know any off the shelf
tools to do it.

I hope this helps.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Saturday, 30 May 2009 19:11:16 UTC