Re: Exploring new vocabularies for HTML

On Sat, 29 Mar 2008, William F Hammond wrote:
> Ian Hickson <ian@hixie.ch> writes:
> >
> > For example, it seems like this:
> >
> >    <math> 3 + n = 6 </math>
> 
> as in the March 1995 draft of HTML 3.0 ??

No, not at all. The idea would be that the parser would automatically 
infer the tag names and so the DOM would look exactly like it would for 
this markup:

   <math> <mrow><mn>3 </mn><mo>= </mo><mi>n </mi><mo>= </mo><mn>6 </mn></row></math>

This is the same way that the following two HTML4 snippets are exactly 
equivalent and generate the same underlying DOM:

   <p><table><tr><td></table>

   <p><p><table><tbody><tr><td></td><tr></tbody></table>

The "tbody" isn't in the original markup, but it's still in the DOM.


On Sat, 29 Mar 2008, David Carlisle wrote:
> 
> It makes it much harder to style (or at least understand the styling of) 
> the mathematics, one reason why mathml fully tags every token is that it 
> makes each token individually aailable for styling, something that is 
> far more likely to be required in math than in natural language text, 
> where you are less likely to want to style individual words or letters.
> If the "implied" elements are available to CSS selectors then presumably
> the thing is still stylable but it is rather obscure and people have to
> understand the exact nature of the implied elements in order to use
> CSS.

Indeed. Of course, they could still include them explicitly if they wish.


> the other problem is that if editors start generating "mathml" with 
> htmlised math with unmarked up text runs such as this, then they break 
> the entire existing mathematical tool chain, which either has to support 
> this new language you are proposing, or have to explian to end users why 
> mathml in html is different from mathml as specifed.

The language would be the same, only the serialisation would be different. 
However, the serialisation would be different anyway, that's the whole 
point -- brining MathML (or some other math notation) to the text/html 
serialisation.


> The rules for inferring what's a number, what's an identifier, what's a 
> sequence of identifiers with invisible times operator between really 
> isn't simple, especially if you move away from ascii (as surely you 
> would have to) so you would probably end up having to refer to large 
> unicode character tables to decide what's a number etc. A feature of 
> presentation mathml is that those decisions get made by the author (who 
> hopefully best understands the expression) and aremn't left to be 
> inferred by a later system.

We could make the simple cases simple without trying to solve everything. 
For example, we could define specific operators like + and -, and simple 
real numbers uing the digits 0-9.


Correct me if I'm wrong, but this:

   <math> 3 </math>

...is invalid MathML markup. <math> elements can't contain numbers 
directly. (Incidentally, I determined this from the DTD, but I can't find 
anything in MathML2 that defines the processing for this error. What 
should that render as, assuming the correct namespaces?)

It seems like in text/html, where we have the option of performing fixup, 
that we might as well do the obvious thing nad treat that as:

   <math> <mn>3 </mn></math>

This can be defined in a completely unambiguous way, it turns what would 
be invalid markup into something useful, and, as a bonus, it makes it much 
easier to write simple equations.

This doesn't in any way stop people from doing things like:

   <math> <mtext> 3 </mtext> </math>

...if they so desire.


On Sat, 29 Mar 2008, William F Hammond wrote:
> > Sure, but there are definitely orders of magnitude of difference 
> > between the verbosity of the different formats. (I hand-wrote all the 
> > MathML in my MathML+XHTML paper at University seven years ago, also in 
> > Emacs.)
> 
> Might I be able to see this paper you wrote?

Sure:

   http://academia.hixie.ch/physics/2dfsurvey/report/report.xml


> 1.  Presentation MathML is works well in at least Firefox, IE +
> MathPlayer, and Amaya.  None of the other math languages you mention
> are known (at least to me) to have support in any web browser.
> 
> 2.  Except for ISO 12083 introducing any of the others would result
> in hybrid markup, thereby increasing the complexity of parsing.  (In this
> list I don't recall more than one person suggesting ISO 12083; how many
> have spoken for it at whatwg?)
> 
> 3.  It seems reasonably clear that user agents now supporting XHTML+MathML
> would be quickly able to add support for MathML in html5.

Sure, there are pros and cons of all the formats. For example, LaTeX is 
widely known by authors, has very high typographic quality, and has many 
more years of implementation experience. ISO12083 is an international 
standard, but the specification is not freely available. MathML is 
exceedingly verbose, but has a defined mapping to the DOM. All of these 
things must be carefully considered.


> > On Sat, 29 Mar 2008, David Carlisle wrote:
> >> > 
> >> > I'm investigating possible options for addressing the problem of 
> >> > "Putting an equation in a Web page". One of the options is doing 
> >> > something with MathML.
> >> 
> >> Given the existing implementation and experience in this area surely 
> >> MathML should not simply be "one of the options" it should be the 
> >> main option.
> >
> > I don't understand the distinction.
> 
> Really?

I don't understand what it means for something to be a "main" option, no. 
Either an option is being considered, or it's not. If there's a "main" 
option, can one of the other options end up being picked? If so, then how 
is it a "main" option? If not, then how is the other option an "option"?

Anyway that's just semantics, I don't think it affects the issue here.


On Sat, 29 Mar 2008, David Carlisle wrote:
> >
> > <semantics> and <annotation-xml> are nice in theory, I agree, but are 
> > they really necessary? While I understand that math experts today 
> > might use them, it seems highly unlikely that the mass market would 
> > ever bother.
> 
> current experience shows you are entirely wrong here, the mass market 
> uses this probably more than the "expert" hand writing the mathml in an 
> xml editor. OpenOffice.Org generated mathml for example is always in a 
> semantics element with an annotation carrying the openofffice.org linear 
> syntax, design science's editors do something similar.

Sure, but on the Web this is just the kind of thing we want to discourage, 
for the reasons Henri gave. We want content to interoperate between all 
user agents, we don't want any user-agent specific cruft in the documents.


> maple can at user option write just presentation mathml, just content 
> mathml, or a semantics element with presentation annotated by content.

Sure, but if we have both, and someone then hand-edits one of them and 
leaves the other alone, then they become out of sync and interoperability 
becomes a nightmare. We actively want to avoid the Web having multiple 
redundant representations of content, it's an anti-pattern that has caused 
any number of issues in the past.


> > Something else that would be useful is a summary of the MathML schema. 
> > I couldn't find anything human-readable in the MathML specs, and the 
> > DTD is not optimised for casual reading. Is there anything like that 
> > available?
> 
> for mathml3 we are authoring the schema in relax ng and deriving (or I 
> should say will derive) xsd and dtd. Actually though the authoring of 
> the formal schema is lagging behind the specification of the prose text 
> of the specification. If you have any particular style of comment 
> annotation that you'd find helpful drop us a line and we'll see what we 
> can do.. current draft of presentation part is 
> http://www.w3.org/Math/RelaxNG/mathml3/mathml3-presentation.rnc

Cool. Is there any kind of visual representation of this?

What I'm basically looking for is something that just lists the elements 
and then for each one lists what elements are allowed as children. e.g.:

   mglyph => EMPTY
   mi     => malignmark, mglyph, #text
   msub   => mi, mo, mn, mtext, ms, mrow, mfrac, msqrt, mroot, mpadded,
             mphantom, mfenced, menclose, msub, msup, msubsup, munder,
             mover, munderover, mmultiscripts, mtable, maligngroup,
             malignmark, mspace, mline, mcolumn, maction, merror,
             mstyle, semantics
   ...

...or some such.

(Boy there are a lot of MathML elements, even in Presentation MathML.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Saturday, 29 March 2008 23:46:05 UTC