RE: MathML Presentation to content transformation from Bernhard Keil on 2003-05-05 (www-math@w3.org from May 2003)

From: Bernhard Keil <Bernhard.Keil@soft4science.com>
Date: Mon, 5 May 2003 22:17:23 +0200
To: <NeilS@dessci.com>
Cc: "Robert Miner" <RobertM@dessci.com>, <www-math@w3.org>
Message-ID: <IIEOKCCPKDALHBHODAENEEEACGAA.Bernhard.Keil@soft4science.com>

>While I agree that this is needed for 100% certainty
 No, this is needed in many trivial cases.

>you wouldn't do that if you were reading a paper containing it.
  You have to know a lot about physics to undersand a physics article.
  I don't know any artificial intelegence software being able to do that job.

A human reader has no problem do understand the following:

<p>The variance as defined in chapter 1.2.1
 <math xmlns="http://www.w3.org/1998/Math/MathML">
   <msup>
      <msub>
         <mi>&sigma;</mi>
         <mi>x</mi>
      </msub>
      <mn>2</mn>
   </msup>
 </math>
 will be written as ....</p>


For a software it is not easy and nearly impossible.
If you would have a rate of 99% matching the right meaning,
that would be much to bad to use it in a batch process without user
interaction.

Whether a text is understandable for a human reader or 
whether it is mashine readable are completely different
questions.

>If the software doing the conversion has information about the context
  In practice a software tool like Mathematica will not be able to handle the
  text  fragment  "A human reader has no problem do understand the following: ..."

>(eg, by user control or metadata extraction),
  My little joke about "..phoning the author.."  should be a synomym for "user control"  or user interaction.

  A good place to store meta data is content markup. 
  As I understood it the difference between presentation and content marup is,
  that presentation markup does not handle meta data.

  Of course you can use some project specific external meta data, defining that
  the symbol "sigma" has this or that meaning.


Bernhard Keil
mailto:Bernhard.Keil@soft4science.com



-----Original Message-----
From: www-math-request@w3.org [mailto:www-math-request@w3.org]On Behalf
Of Neil Soiffer
Sent: Monday, May 05, 2003 9:57 PM
To: Bernhard Keil
Cc: Robert Miner; www-math@w3.org
Subject: Re: MathML Presentation to content transformation



> Like Robert has stated, it is not possible to convert presentation markup to content markup in general.
> In simple cases a heuristic approach can lead to the right result, but this
> is far away from a general solution that can be used without human interaction.
> 
> 
> A presentation markup like this:
> 
> <math xmlns="http://www.w3.org/1998/Math/MathML">
>   <msup>
>      <msub>
>         <mi>&sigma;</mi>
>         <mi>x</mi>
>      </msub>
>      <mn>2</mn>
>   </msup>
> </math>
> 
> can only be converted to the following content markup:
> 
> <math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
> <moapply>
>   <variance/>
>   <ci> X </ci>
> </apply>
> </math>
> 
> by phoning the author and asking him whether this is what he has mentioned.

While I agree that this is needed for 100% certainty, you wouldn't
do that if you were reading a paper containing it.  If the software doing
the conversion has information about the context (eg, by user control or
metadata extraction), it can make these transformation with very high
reliablility.  There is an important point about notation:  it is meant
to imply underlying functionality.  If it is ambigious within its context,
it is probably confusing to readers and will eventually die a timely death.


Neil Soiffer                     email: neils@dessci.com            
Senior Scientist                 phone: 562-433-0685            
Design Science, Inc.             http://www.dessci.com
"How Science Communicates"

Received on Monday, 5 May 2003 16:17:32 UTC