Re: proposed extenstions to content MathML from Andreas Strotmann on 2005-05-10 (www-math@w3.org from May 2005)

From: Andreas Strotmann <Strotmann@rrz.uni-koeln.de>
Date: Tue, 10 May 2005 16:55:13 +0200
To: RobertM@dessci.com
CC: www-math@w3.org, siegrist@math.uah.edu
Message-ID: <4280CB51.6000705@rrz.uni-koeln.de>
RobertM@dessci.com wrote:
> Hello All.
> 
> Kyle Siegrist, who created the Virtual Laboratories in Probability and
> Statistics web site <http://www.math.uah.edu/stat/>, recently
> suggested the following extensions to me, as good candidates for a
> MathML 3 update.  Prof. Siegrist writes:
> 
>  "Here is what I would love to see added to Content MathML:
> 
>    1. Binomial coefficient
> 
>    2. Permutation coefficient:  n(n -1)...(n - k + 1), usually
>    rendered P(n, k) or nPk or (n)k.
> 
>    3. A probability operator with an optional "given" construction
>    (for conditional probability).  Typical rendering would be
>       P(A, B, ...) (without conditioning) or  P(A, B, ... | C, D, ...)
>    (with conditioning).
> 
>    4. An expected value operator with an optional "given" construction
>    (for conditional expected value).  Typical rendering would be E(A,
>    B, ...) (without conditioning) or  E(A, B, ... | C, D, ...) (with
>    conditioning).
...
>   If I had these extensions, I think that I could do just about
>   everything that I wanted without going over to Presentation MathML.
> 
>   Items 3 and 4 (with the "given" construction) are really important in
>   probability, statistics, and stochastic processes; conditional
>   probability and expected value are central notions.  Ordinary
>   probability and expected value can be done with the usual function
>   ("apply") construction, but there is no way to do the conditioning
>   without adding Presentation MathML as a kludge.
...
> Anyone want to second these proposals?  Or take issue with them?

I agree that the "given" construction appears to be central to 
statistics, a topic that is, indeed, covered in German highschool 
classes as an optional topic sometimes.

Like Prof. Siegrist, I do not see immediately how to implement that 
concept with MathML-Content qualifiers as they currently stand, but I'm 
not sure that I would give up without giving it deeper thought. Here are 
some ideas of how one might go about that task.

First of all, I would try and take a look how existing symbolic math 
systems (if any) handle symbolic statistics in general, and this 
construct in particular. That might give one an idea of how to do this 
already.

Second, I would try to understand what the "given" construct actually 
means in this context.  Not being a statistician myself, but having had 
to teach the "given" concept to undergraduates once, my impression was 
that it is quite a complex concept indeed.  Here is what I understand it 
to mean:

  - the concept of a statistical variable is actually fundamental, and 
quite different from a "normal" variable.  It is my understanding that 
such a variable ranges over the class of probability distributions (i.e. 
a statistical variable has a distribution as a value).

  - P(X,Y,Z) actually is a compound concept, consisting of (X,Y,Z), the 
joint probability distribution of the three statistical variables X,Y, 
and Z, and the probability measure function, P, which is actually 
applied to the joint distribution, not the list of arguments.

  - what the "given" construct then does is assign a different "value" 
to the statistical variables that are "given" within the scope of the 
surrounding parentheses, i.e. when constructing the joint probability 
distribution it represents. In other words, a "given" Z gets assigned, 
locally, the probability distribution meaning "known to be true" instead 
of its original one outside the scope of the parentheses around the 
"given" construct.

  - it is possible to have Z "given" as false or true.

If this is a correct analysis, then we could come up with a reasonable 
suggestion for representing it in MathML-Content:

<apply> <probability/>
   <apply>
      <jointdistribution/>
      <bvar><ci>Z</ci></bvar>
      <condition>
        <apply> <given/> <ci>Z</ci> <true/> </apply>
      </condition>
      <ci>X</ci>
      <ci>Y</ci>
      <ci>Z</ci>
   </apply>
</apply>

and render it as P(X,Y|Z).

This representation makes sure that the local reassignment of value to 
the statistical variable (aka binding of the variable) is honored in the 
representation, which is usually an important consideration when 
creating MathML Content.

There are probably still problems with this particular suggestion, but 
it might help understand the problems behind your assertion that "given" 
can't be done in MathML.

Hope this helps just a little bit,

  -- Andreas

> 
> --Robert
> 
> ------------------------------------------------------------------
> Dr. Robert Miner                                RobertM@dessci.com
> W3C Math Interest Group co-chair                      651-223-2883
> Design Science, Inc.   "How Science Communicates"   www.dessci.com
> ------------------------------------------------------------------
> 
> 
>
Received on Tuesday, 10 May 2005 14:55:34 UTC