Re: paper 'A More Canonical Form of Content MathML to Facilitate Math Search'

Chris Lilley <chris@w3.org> writes:

> At Extreme Markup 2007, there was a very interesting paper by Moody
> E. Altamimi and Abdou S. Youssef: 'A More Canonical Form of Content
> MathML to Facilitate Math Search'
>
> http://www.idealliance.org/papers/extreme/proceedings/html/2007/
> Altamimi01/EML2007Altamimi01.xml
                       (caution: long url without newline)

I've not given adequate thought to this article.

But let me say that *if*, as I hope, we're interested in searching
xhtml+mathml content on the web, there are two contexts (i) internal
-- inside a given article (that the user has open in a browser) and
(ii) external -- via search engines.

Either way we have the question of what authors will put up on the
web.  Putting up content markup will, in principle, take more effort
than putting up presentation markup.  Moreover, as long as we have
dual content inside mathml markup, authors who want tight control over
presentation will put up either presentation markup or dual content.

Given the propensity of most authors to stick with PDF as a format
(albeit inferior) for online viewing, I assume that most authors
want tight control over presentation.

So I don't think normalizing content markup will serve all needs
in this direction.

Beyond that the topic of searching through notation is somewhat
like the topic of constructing indices, and it is natural to expect
that better results will be obtained where there has been author
cooperation.  How will such cooperation be enabled and solicited?

Meanwhile, short of various kinds of normalization, I think there
would be gain in rolling out an optional attribute, say, "searchKey"
for various things that wrap symbols and expressions.  (Inside math
should "id" serve this purpose when there is no "searchKey"?)

When a browsing user probes content having such an attribute, the user
should be shown a "cloud" disclosing the searchKey value.  One then
would want a browser function for finding instances.  And, I suppose,
some authors might choose to identify "searchKey" values explicitly
just as print authors sometimes list notations.

Finally, I'll point out that providing "searchKey" attributes will not
require much extra author effort if the author's writing environment,
like gellmu, provides something like LaTeX's \newcommand.

Cheers.

                                    -- Bill

Received on Monday, 20 August 2007 16:10:03 UTC