Re: Exploring new vocabularies for HTML

On Mon, 31 Mar 2008, Bruce Miller wrote:
> 
> The proposal seems to be something like:
> an HTML5 page with MathML-ish stuff in it.
> The math in the _text_ of the page
>
> (1) emphatically does not have the MathML namespace,

I would expect that we would allow the xmlns="" attribute on <math> to 
have the MathML namespace, in the same way as we allow xmlns="" on <html> 
to contain the XHTML namespace. It wouldn't have any effect, though.


> (2) may have omitted end tags,

Well, so can the XML syntax, the difference would be that it wouldn't 
cause a fatal error. Whether it is a syntax error or not is up for 
debate, though I can certainly see strong reasons to make omitting 
closing tags optional in MathML-in-text/html.


> (3) doesn't have empty elements marked as <tag/>,

Right.


> (4) may have attribute values that aren't quoted,

Right.


> (5) may be limited to exclude <semantics>

...and Content MathML, probably, yes.


> may be limited to exclude named entities,

Possibly.


> (6) and may in the extreme case, even omit tags for token elements 
> (<mo>, <mi>, <mn>).

Possibly.


> Now, that math is clearly not the serialization of Classic MathML, nor 
> would it be allowable to put Classic MathML in the HTML5;
>
> Correct so far?

I would imagine that we would go to some lengths to allow "Classic MathML" 
to be pasted into HTML5 and have it work, with a few caveats:

 * no prefixes on the tag names

 * only <mspace>, <malignmark>, <maligngroup>, <mglyph>, <none>, and 
   <mprescripts> use the empty element syntax

 * no DTD internal subset


> OTOH, even in the more extreme case, there's no reason the DOM in the 
> browser created by the HTML5 parser would be any different than the DOM 
> that would have been created by an XML parser parsing Classic MathML.
> Correct?
> Would this actually be a _requirement_ in the HTML5 spec?

We would ensure that given equivalent inputs in the XML syntax or the 
text/html syntax, that they resulted in the same DOM, yes.


> Clearly, such a DOM could be serialized as either Classic MathML or 
> HTML5-MathML.

Indeed.


> Now, it gets interesting:
>
> I'd like to cut that formula and use it in a computer algebra system, or 
> graphing calculator, or....  I need Classic MathML and the browser could 
> reconstruct it from the DOM....
>
> Fine, but will that be a _requirement_ that a browser provide that?

We can't require UI, since, for instance, if a browser is running as the 
display panel in New York's Times Square, it wouldn't make any sense for 
the user agent to provide a little button somewhere just to allow users to 
view the XML-serialised version of MathML snippets on the page.

However, it would make sense for the spec to encourage interactive UAs to 
provide such UI. Whether they do or not of course depends on the UA 
implementor; even if the spec _requires_ it, we can't guarantee it.


> Or, is it anticipated that every MathML importing tool integrate an 
> HTML5 parser?

I think on the long term that would make a lot of sense. It is expected 
that providing HTML5 support on the long term would be as easy as 
providing XHTML support, one would just plug in an appropriate off-the- 
shelf parser.


> Or am I expected to paste to some tmp buffer, and run a 3rd party 
> converter to convert to Classic form?

Or the short term that would work too.


> Alternatively, suppose I'm writing an HTML5 web page and want to steal 
> the math from another page. Will the browser also be required to offer 
> me an HTML5 serialization of the math?

Well, you can always "view source" and take it from there.


> Or, is it anticipated that all HTML or text editors would provide a tool 
> or XSL to HTML5-serialize the XML?
> Or, again, am I expected to use a 3rd party tool?

These questions seem to assume that raw "MathML" wouldn't work in 
text/html, which I hope we would make a false assumption.


> The common theme here is that it is all too easy, though certainly true 
> for many of the proposed "simplifications" of MathML, to say that there 
> is an algorithm for converting between the serializations.
>
> However, unless there is a mandate to require these conversions to be 
> available at some critical junctures, I very much fear that this will 
> result in two effectively disconnected pools of math data.

I think the experiment with HTML and XHTML suggests that there would be, 
on the long term, be two connected pools of data, one very large one and 
one very tiny one, but that exchanging data between them wouldn't be too 
difficult (though tools would certainly make it easier).


> Requiring every MathML importer to include an HTML5 parser, and every 
> MathML exporter to include an HTML5 serializer just seems like a 
> quadratic version of the old joke:
>
>  "Now you've got _two_ problems".

Except that one of the two problems is likely to be largely solved in off- 
the-shelf tools.


On Mon, 31 Mar 2008, Sam Ruby wrote:
> 
> Jacques Distler[1] is not currently subscribed to this mailing list, nor 
> does he have the time to follow it, but it occurred to me that he might 
> have an opinion on the subject, so I asked him, and he gave me 
> permission to post his response here.

Thanks!


Jacques Distler wrote:
> 
> The rules for inferring elements are going to get very complicated very 
> fast. For instance, does
> 
>      146,382
> 
> get translated as
> 
>     <mn>146,382</mn>
> 
> or as
> 
>     <mn>146</mn><mo>,</mo><mn>382</mn>
> 
> ...? How about
> 
>       1468,3825
> 
> ...?

Indeed, that's a good point.


> What about
> 
>      a b
> 
> Is this
> 
>     <mi>a b</mi>
> 
>     <mi>a</mi><mi>b</mi>
> 
> or
> 
>     <mi>a</mi><mo>&InvisibleTimes;</mo><mi>b</mi>

or <mi>a</mi><mo>&ApplyFunction;</mo><mi>b</mi>?


> I generally think that inferred elements (the canonical example in HTML 
> is <tbody>) are a bad idea, confusing to authors (who need to be very 
> sophisticated to realize that they're there in the DOM, even if they're 
> not there in the serialization).
> 
> The MathML Spec already has a bunch of instances where there are 
> inferred <mrow> elements, and this has caused a fair amount of interop 
> headaches, when UA's (in particular, Gecko) get this wrong.
> 
> Please don't add more inferred elements. This is the WORST aspect of 
> existing HTML (which must, alas, be retained for legacy reasons).

The problem is that without inferred elements, the language is extremely 
verbose, maybe prohibitively so.


On Mon, 31 Mar 2008, Robert Miner wrote:
> > 
> > To put it bluntly, raw MathML is too verbose. I can't really see 
> > importing even just Presentational MathML into HTML if we require 
> > authors to type every last <mn>, <mi>, and <mo>. Anything we can do to 
> > make the language more maintainable will go a long way towards arguing 
> > for MathML over the alternatives.
> 
> Thanks for this clear statement.  From the discussion on the list, it 
> seems as though you and James Graham and possibly Henri all share this 
> view to some degree.  Do you know if this view is shared more widely 
> amongst HTML WG members?

Isn't it shared by everyone, even MathML working group members? I mean, it 
is literally orders of magnitude more verbose than any other mathematical 
markup language I'm aware of.


> As you have seen, the majority of math WG members take the opposite view 
> on the cost/benefit analysis in comparing ease of hand authoring vs. the 
> interoperability and backward compatibility concerns Bruce, David, Neil 
> and others have detailed.

Could you elaborate on exactly which cases you are concerned about?

The following cases are cases that I think are priorities, if we do adopt 
MathML:

 * Ability to paste MathML content into text/html and have it work, 
   assuming the MathML content doesn't use a DTD, prefixes, or empty 
   element syntax with elements other than <mspace>, <malignmark>, 
   <maligngroup>, <mglyph>, <none>, and <mprescripts>

 * Ability to process text/html content using HTML5 rules to obtain
   the MathML content the same way as you would if it were XML

 * Ability to hand-write equations without getting RSI

The following cases are cases that I think are not priorities:

 * The ability to copy from the source of an arbitrary text/html document 
   into an XML context (text/html content will most likely not be well- 
   formed, so this is unlikely to work anyway)

(See also http://wiki.whatwg.org/wiki/New_Vocabularies for more details.)


> In any event, I'd like to explore more concretely what might be 
> possible. For argument sake, suppose MathML 3 were to define an HTML 
> serialization

I'd rather the HTML spec defined text/html's serialisation, for the 
record. :-)


> ...along these lines:
> 
> 1) CDATA would be allowed in an mrow, and a simple, table driven parsing 
> model were given defining which Unicode point should be tagged as mi, 
> mn, mo, etc.
> 
> 2) mi, mn, mo tags would remain valid in markup, and merely become 
> option in situations covered.
> 
> 3) situations such as fractions requiring a specific number of child 
> elements would specify an algorithm for inserting merror elements to 
> fill in for missing arguments or wrap extra arguments.
> 
> 4) the schema is factored so that HTML5 could import just the 30 or so 
> presentation elements and semantics
> 
> 5) whatever other lesser syntactic accommodations like quotes around 
> attributes, and so on, are worked out and allowed.
> 
> I think that according to you quoted statement above such a proposal 
> would be something you and maybe other HTML WG members would support for 
> math in HTML5.  Is that correct?

The devil is in the details, but fundamentally, that's the idea, yes.


Jacques' comments above have led me to consider a different approach to 
making the MathML-in-text/html syntax easier to write.

It seems like the most unambiguous option is to focus on making end tags 
optional. This basically consists of defining when an end tag is implied.

</mn>, </mo>, </mi> could be implied whenever a MathML start tag other 
than <mglyph> or <malignmark> is seen while the appropriate element is on 
the stack of open elements.

</mfrac> could be implied when a start tag is seen when the element 
already has two children, and similarly with <mroot>, <msub>, etc.

Almost any MathML close tag could be implied when an <mtr> start tag is 
seen when there's an <mtable> element on the stack but the current element 
isn't an <mtable>.

So e.g. instead of:

<math xmlns="http://www.w3.org/1998/Math/MathML">
 <mi>x</mi> <mo>=</mo>
 <mfrac>
  <mrow>
   <mo>-</mo> <mi>b</mi> <mo>&PlusMinus;</mo>
   <msqrt>
    <msup>
     <mi>b</mi> <mn>2</mn>
    </msup>
    <mo>-</mo> <mn>4</mn> <mo>&InvisibleTimes;</mo> <mi>a</mi> <mo>&InvisibleTimes;</mo> <mi>c</mi>
   </msqrt>
  </mrow>
  <mrow>
   <mn>2</mn> <mo>&InvisibleTimes;</mo> <mi>a</mi>
  </mrow>
 </mfrac>
</math>

...we could have:

<math>
 <mi>x <mo>=
 <mfrac>
  <mrow>
   <mo>- <mi>b <mo>&PlusMinus;
   <msqrt>
    <msup> <mi>b <mn>2
    <mo>- <mn>4 <mo>&InvisibleTimes; <mi>a <mo>&InvisibleTimes; <mi>c
   </msqrt>
  </mrow>
  <mrow>
   <mn>2 <mo>&InvisibleTimes; <mi>a
</math>

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 1 April 2008 00:43:54 UTC