W3C home > Mailing lists > Public > whatwg@whatwg.org > June 2006

[whatwg] Mathematics in HTML5

From: White Lynx <whitelynx@operamail.com>
Date: Sat, 03 Jun 2006 13:33:53 +0400
Message-ID: <20060603093353.5AD1D7B5F3@ws5-10.us4.outblaze.com>
Michel Fortin wrote:
While this may be better than the MathML counterpart, I'd prefer this  
> markup:
> 
>      <p>
>      (X)HTML document may contain math formulae, like
>      <formula>
>      <var>a</var><var>x</var><sup>2</sup> +
>      <var>b</var><var>x</var> + <var>c</var> = 0
>      </formula>
>      </p>
> 
> It's more verbose than what you suggested, but still way simpler than  
> MathML.
Completely agree, provided that exta var elements are optional.


> The other point I'd like to make is that a formula element shouldn't  
> be required for all mathematical expressions. If I want talk about  
> variable x in the middle a paragraph, I shouldn't need to surround it  
> like this: <formula><var>x</var></formula>. Using <var>x</var> ought  
> to be suffisent. The same applies if I want to include x^2 in the  
> text, <var>x</var><sup>2</sup> should be enough.
I would prefer to enclose inline expressions in some container explicitly.
It is necessary for proper styling and conversion to other languages.

Ian Hickson wrote:
> This markup is completely inadequate to represent mathematics. For 
> example, it doesn't say whether "ax" is one variable or two.
More or less the same markup is used in LaTeX,
ISO 12083, AAP Math DTD and several other markup languages,
no one complaints that it is inadequate. One reason why it is used in this manner
 is that in most of cases you simply don't know whether ax is one variable or two 
because expression can be produced by LaTeX to XML convertor that 
does not know what is variable and what is not (in input format such an information is not present),
because WYSIWYG tools does not care what is varibale and what is not and 
because MathML practice showed that in real world such notations are used for presentational
purposes only and no one actually cares whether it is variable or not.
Note that <mrow><mi>a</mi><mi>x</mi></mrow> adds absolutely no semantical value to expession,
HTML option <var>a</var><var>x</var> theoretically adds semantics, however on practice some will
write <var>a</var><var>x</var>, some <var>ax</var> and some ax, thus you can't rely on this semantics.
This is one of the reason why LaTeX, ISO 12083, AAP Math DTD encode only basic structure of mathematical
formulae and do not mark everything on character-by-character basis. 
Note also that unlike computer science, in mathematics you can hardly find two character notations for variables.

> How would you represent the following equation?
>    http://www.w3.org/TR/REC-MathML/images/2_2.gif
> 
> It would almost certainly appear in any paper that included the equation 
> you mention above, so it's important that we support that.
> 
> Or how about:
> 
>    http://scienceworld.wolfram.com/physics/simg297.gif
> 
> ...or:
> 
>    http://scienceworld.wolfram.com/physics/bimg179.gif
> 
> These are examples of equations that are on a very popular site _today_, 
> so any solution to integrating maths into HTML5 must absolutely be able 
> to cope with these cases.
It should be possible to encode equation like this. 
Opera 9, Prince 5 (pure CSS):	http://www.geocities.com/chavchan/21/example.xml
MSIE 6.0 (XSLT + CSS): 		http://www.geocities.com/chavchan/exp/trident/example.xml

You can find much more complex stuff at
	http://xml-maiden.com
(actual markup that we would like to propose for inclusion in HTML5 will
differ from XML MAIDEN and will try to incorporate some requirements posted on this list, 
but for basic ideas of CSS friendly markup language you can refer to XML MAIDEN).
Annotated style sheet that explains how to handle maths like this in CSS is available at
	http://www.geocities.com/chavchan/css/annotated.css


> Consider that it must be possible to render this both to visual media *and 
> to non-visual media* without loss information, 
Lossless presentation of aural media is part of W3C mythology that does not always 
work in real world. I agree that there should be possibility to port mathematics to
speach media and braille but in the same time one should be able to distinguish myths
from reality and make solution that actually works.

> both in CSS-capable 
> browsers *and browsers that don't support CSS at all*.
Compatibility with CSS would mean existence of default CSS style sheet 
that can handle arbitrary complex formulae consistently. This would not harm
XSL formatters as basically there is posibility to use XSL FO instead of CSS and
should harm text browsers that will have to implement markup natively in any case.

> In HTML5 we have other options, too. For example, we could define a 
> special parsing mode wherein any stream of numeric characters (separated 
> by whitespace) implies an <mn> element, any stream of punctuation 
> characters implies an <mo> element, etc, such that your example above 
> could end up looking like this in the source:
> 
>    <math>
>      <mrow>a &#x2062; <msup>x 2</msup></mrow> +
>        <mrow>b &#x2062; x</mrow> + c = 0
>    </math>
> 
> ...with the DOM being the full MathML representation (namespaces, DOM, and 
> everything), with all the implications that has (full mathematical 
> typography, compatibility with an existing language, its renderers, and 
> its content, unambiguous interpretation, etc).

In fact I would prefer not to touch DOM (otherwise we could start wil LaTeX like syntax instead of (X)(HT)ML
and generate full representation via XSLT), but to add some property values to CSS3
like for example:

'text-transform' 
	Value:  	capitalize | math-italic | math-bold | math-bold-italic | uppercase | lowercase | none | inherit 
Initial: none 
Applies to: all elements 
Inherited: yes 
Percentages: N/A 
Media: 	visual 
Computed value: as specified

'math-italic'
	Converts Latin and Greek characters to mathematical alphanumerical characters
	 #x1D432-#x1D467 and #x1D6E2-#x1D71B. If mathematical alphanumerical 
	characters are not supported UA may change font-style of Latin and Greek letters
	to italic
'math-bold'
	Converts Latin and Greek characters to mathematical alphanumerical characters
	 #x1D5D4-#x1D607 and #x1D6A8-#x1D6E1. If mathematical alphanumerical 
	characters are not supported UA may change font-weight of Latin and Greek letters
	to bold.
'math-bold-italic'
	Converts Latin and Greek characters to mathematical alphanumerical characters
	 #x1D63C - #x1D66F and #x1D71C - #x1D755. If mathematical alphanumerical 
	characters are not supported UA may change font-weight of Latin and Greek letters
	to bold and font-style to italic.

This should be sufficient to address some of the typographical concerns mentioned on this list.

> What we need, to move forwards on this, would be a full proposal for what 
> you want added to HTML5. 
I will write sketch of proposal and post it on list, then we can be more specific.

> Currently this thread seems mostly to be along 
> the lines of "we should add maths, but we shouldn't make it hard".
Yes but "we should add maths" and "we shouldn't make it hard" have quite definite meaning.
We should add maths because current situation is simply ridiculous, consider for example fractions like
<fraction>
<num>numerator</num>
<den>numerator</den>
</fraction>
Even MSIE can hadle them (including deeply nested patterns like fraction inside fraction inside another fraction etc.)
using simple CSS style sheet
	fraction
		{display:inline-block;
		white-space:nowrap;
		text-align:center;
		vertical-align:-0.8em;
		margin:0 2px;
		font-size:1em;}
	num, den
		{line-height:1.5em;
		font-size:0.9em;}
	num
		{border-bottom:solid 1px;
		display:block;}
	den
		{display:inline-block;
		vertical-align:text-top;}
Opera and Prince can handle them in the same manner. Safari just needs small bug fix 
(baseline alignment of inline-blocks) and Mozilla has to fix bug #9458 (it should be fix
in any case, but even without bug fix it is not a big issue to cook style sheet using -moz-inline-box).
Here is example (works in Opera 9, MSIE 6.0, Prince 5): 
	http://www.geocities.com/chavchan/frac/fractions.xml
Thus price that browser developers have to pay for fractions is very close to zero, so why not to make some 
mathematicians happy and include fractions in HTML5? The same applies to nearly each and every 
mathematical expression, so it is funny to have opportunity and not to use it just because seven years ago
someone at W3C decided to "reinvent wheel, make it square and put the horse behind the cart".

So basically what we mean by "we should add maths, but we shouldn't make it hard"
is that we should add simple, human processable math oriented markup like ISO 12083 that however fits well
in XML + CSS framework (ISO 12083 was developed before XML and CSS and worked in 
SGML + DSSSL framework that drastically differs from current XML + CSS so one has to change some
parts of markup to make it work) that would allow browser developers to implement 
extra markup with a few efforts by reusing existing capabilities of rendering engine and 
from the beggining avoiding unnecessary conflicts between ad-hoc math oriented style language 
(like mstyle element with collection of formatting oriented attributes in MathML) and 
CSS used in parent HTML document.

Michel Fortin wrote:
> I'm pretty sure that with SVG and CSS 3 border-image[1] it wouldn't  
> be too hard to have professional looking scalable radicals,  
> integrals, and brackets. Matrix would be taken care of by inline  
> table, faction with inline blocks.
Yes, border-image should be suitable for nearly everything but large angle  brackets.
In fact it would be nice to have something more reliable than border images.

> What could prove a little harder is positioning of integral  
> endpoints, as well as lower and upper bounds of summation and product  
> symbols, without resorting to awkward markup.
Yes, markup is slightly awkward in this case. But is not a problem in case of operators.
The real problem appears when you have to align stacked indices with top and bottom of matrix.

> But it'd certainly be a lot easier for browser implementors to add  
> some math-specific CSS properties for the missing parts than to  
> create a full MathML implementation.
Yes, number of required CSS extensions will be small (may be even zero).
In case of full MathML implementation the main problem is not the full MathML implementation itself,
but consistent integration with the rest of standards (consider CSS, DOM2 Style) which is real headache.

Alexey Feldgendler wrote:
>> * stacking of multiple signs like tildes, arrows etc above variables
> 
> > Unicode allows several combining diacritical marks per base character.
> > But browser support for combining diacritics is not perfect.
> 
> They need to stack over each other, not overlay each other.
Yes, combining diacritical marks are designed to stack over each other (not overlay). 
But are not supported properly by current browsers.

>> * correct continuation of long fractions on the next line
> 
> > Never seen something similar. We just prohibiting line breaks in  
> > fractions.
> 
> TeX avoids them, too, but sometimes a complex fraction is simply longer  
> than the line. There are typesetting rules on conitnuation of fractions.
Ok. It is not possible in CSS2.1 and I doubt it is actual requirement.

>>> * stretching of tildes etc over complex expressions
>> It's an open issue. Can't promise anything. Possible solution could be  
>> SVG inserted from style sheet using CSS generated content.
>Can SVG content be generated?
It can be inserted as either background image or via content:url('data-uri-goes-here').

>> * matrices with cells of uniform size (as to accomodate for the largest
> >> expression found)
> > Not possible within CSS2.1 tables model, unless widths are specified  
> > explicitly. So the burden of making cells uniform lays on author.
> Is it possible in CSS3?
AFAIK not yet.

> Here is what should happen:
> 
> text text text
> text x^2 text
> text text text
> 
> Here the superscript should not increase line spacing. But if there was  
> some more complex expession, TeX would add some extra spacing above (or  
> below, if needed) that line. TeX has a configurable threshold of how far  
> can an inline formula ascend and descend from a line without having to  
> increase the line spacing.
It is possible to suppress effect, by reducing line-height of sub/superscripts.
/* Relative positioning eliminates the problem entirely, but is not the best solution
for complex formulae as it may cause some expressions to overlap */






-- 
_______________________________________________
Surf the Web in a faster, safer and easier way:
Download Opera 8 at http://www.opera.com

Powered by Outblaze
Received on Saturday, 3 June 2006 02:33:53 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:27 UTC