Re: Exploring new vocabularies for HTML from Ian Hickson on 2008-04-01 (public-html@w3.org from April 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 1 Apr 2008 06:25:00 +0000 (UTC)
To: Neil Soiffer <Neils@dessci.com>
Cc: Bruce Miller <bruce.miller@nist.gov>, Sam Ruby <rubys@us.ibm.com>, Robert Miner <robertm@dessci.com>, Henri Sivonen <hsivonen@iki.fi>, David Carlisle <davidc@nag.co.uk>, public-html@w3.org, www-math@w3.org
Message-ID: <Pine.LNX.4.62.0804010612480.28180@hixie.dreamhostps.com>

On Mon, 31 Mar 2008, Neil Soiffer wrote:
> 
> If you need any more examples of why parsing math is harder than it 
> might seem at first blush, let me know.  I know of probably a dozen off 
> the top of my head and could probably double that without a whole lot of 
> work.

Please do provide such examples, that would be exceedingly useful.

> One the consequences of the above rule is that content MathML will not 
> be part of HTML5.  Speaking for myself, I can live with that as that has 
> been the case for Firefox for years and fits with the idea that users 
> should supply style sheets or other means to specify how to present the 
> content.
> 
> One area that has been the focus of much discussion is semantics, et. 
> all. I strongly recommend those tags be included.

I don't understand; the two paragraphs above seem to contradict each 
other. Could you elaborate on what you mean when you say that not 
including Content MathML is ok, and on what you mean when you say that it 
is important that we include semantics?

> There have been theoretical arguments that it allows data to be out of 
> sync, but practice has shown that this is a minor concern at best.

On the contrary, experience with the Web has shown that including 
redundant data (e.g. accessibility metadata, page description metadata, 
and so forth) is actively harmful, as it is almost always out of sync with 
the data seen by most users. It is also the case that most people wouldn't 
know it was available. I would imagine that a much better and more 
productive way to provide Content MathML to users would be to include the 
Presentational MathML inline, and then have links for users to download 
separate MathML files containing the Content MathML.

> As another data point, Mozilla's implementation of MathML initially left 
> off semantics -- this caused most MathML to fail in Mozilla because most 
> MathML is generated by program, not by hand and most programs use that.  
> Its omission was an oversight, due to semantics not be listed in the 
> presentation chapter.  It was added in and now Firefox happily accepts 
> semantics.

When you say it "accepts" it, do you mean it ignores it?

What would it mean for the HTML5 language to "support" semantics? Given 
that every element supported must be explicitly handled, would it mean 
including support for all 140+ Content MathML elements explicitly in the 
parser?

> The cost of supporting semantics is minimal

Depending on what you mean by "supporting semantics", the cost may be far 
from minimal.

> and I hope you consider it part of "Classic MathML" as it occurs in the 
> majority presentation MathML on the web.

Do you have any precise numbers on this? It would be interesting to study 
this in more detail. (I did an ad hoc survey of half a dozen pages 
containing MathML collected mostly at random by people who did not know 
what the pages were to be used for, and my results strongly suggested that 
on the contrary, most pages that contain MathML only contain the 
Presentational MathML variant, and no <semantics> element nor Content 
MathML. However, this sample is far from fair.

> One thing to note in your above example is that you have used two named
> entities.  I believe that these have been ruled out for HTML5.

Nothing has been ruled out.

> The lack of such named entities will make it much tougher to hand author 
> math (in any form) in HTML5.

Yes, I think we would probably want to include them. I understand there is 
some issue with &phi;, though.

> One unfortunate thing about the discussion on hand authoring is that it 
> has mostly been devoid of facts.  Some *facts* on percentages of 
> hand-authored vs machine-authored HTML should be part of a reasoned 
> discussion, but sadly neither side has produced any such facts.

Indeed. Unfortunately it isn't clear how to collect such information.

My experience has been that many pages are in fact hand-authored, either 
directly in a text editor, or through CMS systems that provide raw HTML 
editors, or through templates that are hand edited. I do not think we can 
forgo addressing the needs of hand-authoring content creators.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 1 April 2008 06:25:45 UTC