Re: [pointless:Partiles:Structure]: i cannot understand this phrase... from C. M. Sperberg-McQueen on 2001-07-24 (www-xml-schema-comments@w3.org from July to September 2001)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Mon, 23 Jul 2001 18:45:38 -0600
To: "Martin Gudgin" <marting@develop.com>
Cc: "choi jongwon" <jwchoi@digiweb21.com>, "Henry S. Thompson" <ht@cogsci.ed.ac.uk>, <www-xml-schema-comments@w3.org>
Message-Id: <4.3.2.7.1.20010723170616.02b8ed08@espanola.com>
At 2001-07-18 18:38, Martin Gudgin wrote:
 >> Because <choice/> is unsatisfiable.  It's like AND and OR considered
 >> as n-ary functions:
 >>
 >>   AND() is true ; including <sequence/> as a required part of a
 >>                   content model does not change what it validates
 >>
 >>   OR() is false ; including <choice/> as a required part of a content
 >>                   model ensures that it never validates anything
 >>
 >> So for <choice/> to be pointless, it must be optional.
 >
 >I don't get this at all... I don't understand what you mean by n-ary
 >functions in this context. It seems very strange that <choice/> is
 >unsatisfiable. Can you elaborate please?

Since Henry is away, I'll see if I can do the job.

The operators AND and OR can be treated as binary operators, which is
the most common case, or as n-ary operators, which is a standard
feature of Lisp, though I have not encountered it elsewhere.  Treated
as an n-ary operator, AND takes an arbitrary number of arguments, and
returns TRUE if and only if all of the arguments are TRUE -- or
rather, it returns FALSE if and only if at least one of the arguments
is false, and otherwise it returns TRUE.  So, if a, b, and c are all
TRUE and x, y, and z are all FALSE, we get

   (and a b c) ==> TRUE
   (and a b c x y z) ==> FALSE

and so on.  Notice that these results are (owing to the associative
properties of AND) the same as for AND as a binary operator:

   (a AND (b AND (c AND (x AND (y AND z)))))

Similarly, OR returns TRUE if and only if at least one of the
arguments is TRUE.

   (or a b c) ==> TRUE
   (or a b c x y z) ==> FALSE

In some (all?) Lisp dialects, these are implemented as 'short-circuit'
expressions: AND evaluates its arguments either until they are all
used up, or until it hits the first false one.  When it hits a false
expression, it returns immediately.  OR does the same thing, except
that it returns early if it hits a true argument.

Consider this description: "AND evaluates arguments one at a time.  If
it hits a false argument, it returns FALSE.  If it hits the end of the
arguments (N.B. this does not happen if we have returned FALSE after
hitting a false argument), it returns TRUE."  And similarly for OR.

Now: what happens if the Lisp AND operator is called without
arguments?  It hits the end of its arguments without having found any
false ones, so it returns TRUE.  Similarly, if OR is called without
arguments, it hits the end of its arguments without having found any
true ones, so it returns FALSE.

So Lisp programmers, at least, are used to

   (and) ==> TRUE         (or rather, ==> t)
   (or)  ==> FALSE        (==> nil)

This is not just a quirk of lazy evaluation.

In formal systems, it is handy to have special values which can be
used to make any operator into the identity operator, so that for any
value v, the expression

   v op magic-value = v

For addition, the magic value is 0, for multiplication it's 1.

   v + 0 = v
   v * 1 = v

For AND, it's TRUE, and for OR, the magic value is FALSE.

   v AND TRUE = v
   v OR FALSE = v

When operators / functions take arbitrary numbers of arguments, these
values may be, and often are, associated with the operator or function
in question: when called without any arguments, the + operator in Lisp
returns 0, while the * operator returns 1, and 'and' and 'or' return,
as mentioned above, t and nil respectively.

Formal definitions of regular languages will make clear that for
concatenation, the magic value is the empty string, while for
disjunction, the magic value is the empty set.  N.B. these are
not the same thing:  if we write '' for the empty string (often
formal treatments use epsilon or lambda), and {} for the empty
set, then '' denotes the language with one member (namely, the
empty string), while {} denotes the language with no members.

In translating from vanilla regular expressions to content models, it
is useful to associate an empty sequence with the magic value for
concatenation, and to say that it is satisfied by any zero-length
sequence of children, and to associate the empty choice with the magic
value for alternation / disjunction, and to say that it is not
satisfied by any sequence of children at all.  So <sequence/>,
regarded as the definition of a language, denotes the language
consisting of the empty string of (child) elements, while <choice/>
denotes the language which has no members (the empty language).

Writing

   <xsd:element name="dink">
    <xsd:complexType>
     <xsd:sequence/>
    </xsd:complexType>
   </xsd:element>

makes sure that 'dink' elements have no children (warning:
culture-specific humor may be present here; as a demographic label for
childless couples both of whom have a job, 'dink' is short for "double
income, no kids", so it's a constitutive feature of dinks that they
have no children).  By contrast, writing

   <xsd:element name="elephant-child">
    <xsd:complexType>
     <xsd:sequence/>
    </xsd:complexType>
   </xsd:element>

ensures that it is not possible for the content model of the
elephant-child element ever to be satisfied.  (The element is
named for the Elephant's Child in Rudyard Kipling's Just So
Stories, which had a curiosity which could not be satisfied.)

Such a content model is likely to have limited practical use (but see
http://lists.w3.org/Archives/Public/w3c-sgml-wg/1996Sep/0049.html for
one application of the idea), but in formal contexts it is very handy
to have ways of writing down expressions with these characteristics,
which is why the Formal Description document long ago found it
desirable to introduce special symbols for 'empty choice' and 'empty
sequence'.

I hope this helps.

-Michael Sperberg-McQueen
Received on Monday, 23 July 2001 20:38:19 UTC