Re: Mathematical selection from Richard Kaye on 2006-03-30 (www-math@w3.org from March 2006)

From: Richard Kaye <R.W.Kaye@bham.ac.uk>
Date: Thu, 30 Mar 2006 21:13:52 +0100
To: www-math@w3.org
Message-Id: <200603302113.53001.R.W.Kaye@bham.ac.uk>
On Thursday 30 March 2006 15:14, Bruce Miller wrote:
> Richard Kaye wrote:
> > On Thursday 30 March 2006 12:35, Paul Libbrecht wrote:
> >>W Naylor wrote:
> >>>I though to try out the ORCCA tex -> MathML translator on your
> >>>example:I input the document:
> >>>\documentclass[11pt]{article}
> >>>\begin{document} $$3*a+b$$ \end{document}
> >>>and get out the MathML:
> >>><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"
> >>>overflow="scroll">
> >>><mn>3</mn><mo>*</mo><mi>a</mi><mo>+</mo><mi>b</mi></math>
> >>>now this is machine generated, (though I suspect that many authors would
> >>>be lazy and wouldn't put an mrow around the 3*a, if they were creating
> >>>this by hand)
> >>
> >>Well, that's an example where solution-1
> >>(presentation-tree-based-selection) is doing the same as text
> >>selection... it clearly is wrong but if the author is aware of it, he
> >
> > Actually, it is only clearly wrong with standard semantics where
> > + = addition and * = multiplication on some standard field such
> > as the real numbers, and using standard conventions on precedence
> > (and perhaps in a context where you are using standard classical
> > logic to discuss real numbers).
>
> I'd argue it's wrong in any case, or at least of dubious meaning;
> What does a construct like "a op1 b op2 c" mean?.  It's just that
> the "right" form is not apriori clear, without knowing the author's
> intended notations. 

Well, if you want to be discussing the meanings of these rows
we need to have some input  with semantic content, like in OpenMath, 
and not p-MathML. p-MathML says it "means" a row of symbols to 
be rendered in some way the renderer sees fit.  I like p-MathML 
precisely because it *doesn't* make any attempts to attach any 
other "meaning".  In other words when my research leads me to new 
mathematics no one else has thought of, I know I can always render 
it in p-MathML. I could also express it semantically in OpenMath if 
I am willing to write CDs.  I couldn't express it in c-MathML, 
because that is only concerned with the meanings in some kinds 
of maths that people worked out years and years ago.

> And if you're opening the can'o'worms
> of non-standard notations, why assume that * and + are infix
> operators at all? Maybe "3*" is a prefix opererator acting on "a" ?

There are no such assumptions.  This is a string of symbols with
a default suggestion to the renderer that * and + are "infix" (but it
is not really clear what "infix" means other than by examples of
standard practice and it is up to the renderer how this 
should affect its rendering). There are also some default 
suggestions as to the spacing on either side of each symbol.

> By default TeX assumes they are operators, but, like MathML,
> there's no precedence associated with them.

Actually, TeX doesn't assume anything like this either.  It has
classification of symbols (eg \mathbin, \mathop, \mathrel and
\mathord I think) that tells it via a slightly complicated algorithm
how much space to put between two symbols.  I looked at this very
hard a long time ago when I realised that for the sort of maths I do
this algorithm didn't work.  Fairly reasonably, I wanted some operators
like \wedge (meaning "and") to have more space round them than 
relations like < which should have more space than arithmetic 
operators like +.  I also wanted < to have less space when 
used in contexts like "forall x<y". There are simply not enough 
levels in TeX.  In the end I gave up and now I adjust things by hand when 
it doesn't look right. (BTW It would be interesting to hear other people's
ideas on how to achieve this in p-MathML... I have my own ideas but 
they're not very nice.)

> Unless the author's markup is has explicit structure,
> whatever agent is translating to MathML will need to parse.
> Ideally that agent would allow for non-standard notations,
> but the standard makes a good default.

Agreed. And if you are using non-standard notations you'll
probably be writing a lot of lspace="..." rspace="..." and
form="..." attributes too.  (And a lot of hoping and praying
that the renderer does the right thing...)

> To get back to Paul's original question; 

Oh yes.  Sorry for the rant :)

> Have you thought of
> taking a hybrid approach?  Ie. expand the selection based
> on presentation-tree considerations, and then _if_ there are
> parallel markup linkages, adjust the selection as needed.
> That would seem to do as much fixup as you can, given whatever
> markup you're given.
>
> Depending on what the selection is _for_, however, a pure
> single content subtree might not be what's desired, however.
> It might be reasonable to select multiple subtrees provided
> they are adjacent siblings. Assuming the above example were
> properly nested (using standard precedence :> ), "*a"
> (two subtrees) might be a useful selection that would fit
> the criterion. OTOH, you wouldn't be able to select "*a+",
> which is a good thing.

With standard notations and meanings and *=multiply, 
"*a" might mean the postfix operator of multiplication by a.  
"*a+" might mean the  infix operator of multiplying the lefthand 
argument by a and then adding the result to the righthand 
argument.  Who knows? it's just possible someone might 
actually want this.

I would simply allow the selection of anything that could possibly
go in an <mrow>...</mrow> as defined in the DTD or schema
(and return the code *with* the implied <mrow>...</mrow> for 
safety).  Then you'll need two options for pasting it: either
to paste the whole mrow as a single item into an object, or to
paste the content of the mrow as a list of several objects into 
an object. Both are needed. I can't think of anything simpler.

Best wishes to all.

Richard
Received on Thursday, 30 March 2006 20:16:48 UTC