- From: <juanrgonzaleza@canonicalscience.com>
- Date: Wed, 15 Mar 2006 03:56:31 -0800 (PST)
- To: <www-math@w3.org>
I think I have detected some confusion in a topic I consider of maximum importance. For that reason, I would remark the two obvious (related but independent) points of the CanonMath research are being mixed by others: 1) the need for going beyond usual available tools (TeX, MathML, ASCIIMath, etc.) simply because they fail to provide solutions to some real life problems. 2) the final implementation of the notation/syntax chosen. For example, <CanonMath>a <fraction/> b</CanonMath> vs <CanonMath>a &fraction; b</CanonMath> vs <CanonMath>a \fraction b</CanonMath> vs <CanonMath>a \fraction b</CanonMath> vs <CanonMath>a <fract/> b</CanonMath> vs <CanonMath>a <f/> b</CanonMath> vs ... Therefore, it has no sense that some authors rejecting the possibility of using mixed markup and empty tags were suggesting me the use of a subset of TeX or ASCIIMath or Mathematica or so. Above point 1) impedes to me the of use TeX or ASCIIMath. The main discrepancy here was about the meaning of mixed vs pure content and textual vs XML markup, which is related to 2) but is not related to 1). Of course, perhaps I am completely wrong here, but I simply do not know any other way to really improve the horrible input of MathML markup without the use of infix notation and mixed markup in XML ways. ((( ABOUT INFIX NOTATION ))) I find interesting the strong rejection that infix mixed notation such as <CanonMath>a<fraction/>b</CanonMath> is founding when in HTML all of us wrote <p>this is some <b>important</b> text</p> *instead* of redundant <p><span>this is some</span><b>important</b><span>text</span></p> The former case is usually named "document oriented" markup, whereas the latter is "data oriented". MathML markup is designed to be "data oriented", and that is the basis of its extreme verbosity. I am attempting to design a "document oriented" mathematical markup. The tree structure for the fragment <p>this is some <b>important</b> text</p> is topologically the same that for empty tags <p>this is some <b/> text</p> (yes, it has no sense the use of empty bs but that is not the point. If you feel still disturbed then another example I could propose now is the use of HTML <br/> tag into paragraphs!) which looks as a fraction in infix notation <CanonMath>a<fraction/>b</CanonMath> ((( ABOUT MIXED CONTENT ))) In each day practice, textual markup is used in XML data. In fact, markup as <author> <surname>Carlisle</surname> <firstname>David</firstname> </author> is so valid, structured, and good for a XSLT programmer as is <author>Carlisle, David</author> whereas latter is better from an user view. The name and surname are extracted from the <author> content via XPath rules in real life implementations of XML technology. Therefore, the surname-firstname markup is both redundant and against human authoring. It is more, last markup is so CSS compatible as the first example of above, when one uses CSS into a XSLT. Some authors prefer use an alternative markup like <author>Carlisle<separator/>David</author>. which is also valid. In XSLT manuals one find many examples of that kind. Apparently some people think that one is forced to work with a pure data markup instead of with easy mixed content was later transformed to MathML via a XSLT or script or similar. I see not a great difference between this proposal <CanonMath>a<fraction/>b</CanonMath> and standard (X)HTML <p>This a line of text<br/>and this another</p> If one claims first may be tagged as <CanonMath><mi>a</mi><fraction/><mi>b</mi></CanonMath> then one would recommend also some like <p><ti>This a line of text</ti><br/><ti>and this another</ti></p> for the future XHTML 2.0 specification, where <ti> means "textual identifier" being a hypothetical counterpart for the mathml <mi> tag. No? ((( SOME THOUGHTS ABOUT SCHEMA ))) I also find surprising the appeal to the use of Unicode or textual markup before using empty tags. Apparently, some people feel disturbed with some as <tag>A<leftrightarrow/>B</tag> and would prefer the use of (textual) <tag>A \leftrightarrow B</tag> or (Unicode) <tag>A 'U02194' B</tag> I also find disturbing the appeal to the use of character entities that is an old technique hereditary from the SGML world via DTDs. Curiously, the W3C Schema specification states that special characters would be introduced via empty tags! >From the own Schema (XML Schema Part 0: Primer Second Edition) specification <?xml version="1.0" ?> <purchaseOrder xmlns="http://www.example.com/PO1" xmlns:c="http://www.example.com/characterElements" orderDate="1999-10-20"> <!-- etc. --> <city>Montr<c:eacute/>al</city> <!-- etc. --> </purchaseOrder> And nobody in the Schema WG worried about the tree structure <city> Montr <c:eacute/> al </city> introduced by the use of <c:eacute/> instead of the DTD é. Then why so worry about possibility for markup like next? <CanonMath> a <division/> 2 </CanonMath> Some like <CanonMath> A + <beta/> = 23 </CanonMath> looks natural since the Schema entity <beta/> is the evolution of the current MathML DTD β ((( Why you can author XHTML but cannot MathML? ))) Think during an instant in how we usually work with (X)HTML. Next, I write a usual markup <p>This is some text I do not need break into small tokens</p> <p>Next, I am writing a <em>Spanish</em> word is typed via a Schema entity: Ni<ntilde/>o</p> I do not needs additional spans before and after the emphasized element nor surrounded the Schema entity. My computer knows about that! Next a hypothetical data-oriented markup version (by commodity I write only the first paragraph of above) <p><c>T</c><c>h</c><v>i</v><c>s</c><space/><v>i</v><c>s</c><space/> <c>s</c><v>o</v><c>m</c><v>e</v><space/><c>t</c><v>e</v><c>x</c><c>t</c> <space/><v>I</v><space/><c>d</c><v>o</v><space/><c>n</c><v>o</v> <c>t</c><space/><c>n</c><v>e</v><v>e</v><c>d</c><space/><c>b</c> <c>r</c><v>e</v><v>a</v><c>k</c><space/><v>i</v><c>n</c><c>t</c><v>o</v> <space/><c>s</c><c>m</c><v>a</v><c>l</c><c>l</c><space/><c>t</c><v>o</v> <c>k</c><v>e</v><c>n</c><c>s</c><c></p> Compare both! I do not need explicitly to say to my computer (which is very intelligent) what is each character in paragraph, therefore, the MathML data markup <math> <mi>E</mi> <mo>=</mo> </mi>m</mi><msup><mi>c</mi></mn>2</mn></msup> </math> can be ***typed*** by humans as <math>E = m<msup>c 2</msup></math> without need for introducing redundant markup, e.g. stating that 2 is a number! Now well, how do we introduce superindices in HTML? This way <p> this is superindex<sup>2</sup></p> Therefore, we would prefer the unification of all the XML technology. It has no sense that the MathML specification states a way to superindeces and XHTML another. It has no sense that last XHTML deprecated the use of style tags and even attributes, whereas the MathML use <mstyle> and color attributes, etc. Moreover, using infix notation for easing the writing the (X)HTML way transforms to <math>E = m c<sup/>2</math> the m of msup tag is also redundant, since computer know when <sup/> tags are into <math> tags and when are in a paragraph of text. The final XML input is very close to natural <TeX>E = m c^2</TeX> Curiously some people is very disturbed on the mixing of text with empty tags as <sup/>. I repeat again that mixing is defined by W3C Schema as natural evolution from DTD entities. Somewhat as one can use a plugin for transformation from above TeX to MathML, one can use a plugin or a XSLT or Javascript for transformation from CanonMath to MathML. But CanonMath would introduce *nine* advantages (were listed in Canonical Science today) are not present in content and presentation MathML, TeX/LateX, Itex, or ASCIIMath. Usually things as H would be automatically parsed into <mi>H</mi>. And yes, I have also thought the possibility for parsing into <mo>H</mo> which could be done via some input tag as <op>H</op> or similar. Difference is latter would be used in _some occasions_ for fine-tunning. Note that one is obliged to use the <mo>, <mi>, and <mn>s even in trivial cases as numbers, "=", etc when authoring in MathML. Juan R. Center for CANONICAL |SCIENCE)
Received on Wednesday, 15 March 2006 11:56:49 UTC