Re: Exploring new vocabularies for HTML

(copies to www-math and emj)

Hi Ian --

You reference:

   [1] http://www.whatwg.org/issues/#html-parsing-rules-namespaces-discussion
   [2] http://wiki.whatwg.org/wiki/New_Vocabularies 
   [3] http://lists.w3.org/Archives/Public/public-html/2008Mar/0156.html

I very briefly scanned your references.  I have a few comments that
may only be tangentially related to your present concerns with math
in html5.

1.  Math in browsers:

    xhtml+mathml is rock solid in Firefox and IE+MathPlayer.

    (Moreover, it's fully interoperable with Dave Raggett's
    slidy.  I think that it is now the best slide show presentation
    format since screen real estate is so easy to manage.)

2.  Web visible adoption:

    Some search engines seem not to see application/xhtml+xml.

    Though adoption has been slow, it's happening.  There are are at least
    three math journal operations using xhtml+mathml at least for article
    abstracts:

    1. Mathematical Sciences Publishers, Berkeley and Warwick,
       http://www.mathscipub.org/

    2. NUMDAM, Grenoble, http://www.numdam.org/

    3. Project Euclid (some of its journals, i.e., Duke, JMSJ, ...),
       Cornell, http://projecteuclid.org/

    AFAIK full articles in xhtml+mathml have not appeared at the above
    yet, but the Lobachevskii Journal of Mathematics, Tatarstan,
    http://ljm.ksu.ru/ is doing that, apparently by translating LaTeX
    with tex4ht.

3.  Direct authoring:

    Forget "direct authoring" of MathML.

4.  Authoring xhtml+mathml:

    The best techniques are based on the use of an author-level XML
    document type supporting math.  The document type represented by
    axgellmu.dtd in the tarball at CTAN:/support/gellmu
    or at http://www.albany.edu/~hammond/gellmu/xml/axgellmu.dtd
    is an example.

    Apart from "regular" gellmu, the gellmu syntactic translator can
    be used for latex-like "compact" markup, analogous to but
    different from compact relaxng notation, with any xml document
    type.

    Other currently supported examples providing author-level math seem
    to exist only behind closed doors.

5.  Translating LaTeX:

    More than 10 years of experimentation shows this is not a free
    ride.  Tex4ht, suitably configured, is good and gets wide use, but
    requires care.  http://www.cse.ohio-state.edu/~gurari/TeX4ht/mn.html

    Another interesting translation project is "latexml",
    http://dlmf.nist.gov/LaTeXML/, that is being used in an ambitious
    project called "arXMLiv" at Bremen to translate the contents of
    the arXiv http://kwarc.info/projects/arXMLiv/

    Translation of LaTeX is important because there is more than 30
    years of legacy content.

    Translation of LaTeX is a workable, but inferior, route for new
    content.  It is inferior if only because it is essentially impossible
    to define in a precise way what is correct markup.

6.  Named entities, an observation:

    Of course, we all know that a named cdata character entity becomes
    numeric on first parse.  However, it seems to be the case that
    some user agents handling mathml are internally converting numeric
    data points to names.

    The sticky thing here is not the retro conversion but the fact
    that nuggets of cdata have become decision points.  Remember that
    in sgml sdata entities (not currently available in xml) can
    survive parsing and are sensible decision points.

7.  Searching in math:

    I will only speculate that the searching technique should operate
    through attribute values.

    For the interim I would like to see web search engines that
    enable the user to search enter requests like:

           attr%href:uchicago.edu
           attr%div.class:^subsection$
           attr%div.class:(display|inlineblock)
           attr%key:thetaGenus1WithChar1001

8.  Printing:

    The best approach with new content is for the author to write
    latex (to be translated, as above) or to write for an author-level
    XML document that admits formatting both toward latex and toward
    xhtml+mathml.  For good results one may want more power than is
    available with xslt.

    It is sane to translate xhtml+mathml to latex.  I expect that
    piping from a browser to an external formatter is likely to be
    better than what I imagine for any future browser print service.


You wrote:

>   4. Writing documents that include diagrams that include
>      typographically-correct mathematics.

I assume you mean an svg diagram or a png image.
Someone from the WG might want to speak to this.


                                    -- Bill

Received on Wednesday, 26 March 2008 19:21:05 UTC