Re: searching for math

Hi all,

It's always great to start with the running example, thanks Neil for that.
In the community I'm close to, the TeX for that formula ought to be:

\frac{1}{2 \pi} \int_{-\pi}^{\pi} e^{i(x \sin \tau-n \tau)} d \tau

And that's what people would have as a starting point. I got it via OCR-ing
with MathPix.

A little known (relatively new) feature of that one application is their
own limited math search, I believe only from a small selection of
education-related corpora (Wikipedia, StackExchange, etc), I believe based
around some in-house algorithm related to the LaTeX format (and not
MathML).

This is not an endorsement of MathPix, but they do have my attention in
recent days. Just as some rough example of what a working math search ought
to return back to a user:

[image: image.png]

Meanwhile, here are the result pages - when querying for this latex - from
a few major vendors:

1. Google:   DuckDuckGo and Yahoo:
https://www.google.com/search?q=\frac{1}{2+\pi}+\int_{-\pi}
^{\pi}+e^{i(x+\sin+\tau-n+\tau)}+d+\tau

2. Bing:
https://www.bing.com/search?q=%5Cfrac%7B1%7D%7B2+%5Cpi%7D+%5Cint_%7B-%5Cpi%7D%5E%7B%5Cpi%7D+e%5E%7Bi%28x+%5Csin+%5Ctau-n+%5Ctau%29%7D+d+%5Ctau

3. DuckDuckGo:
https://math.stackexchange.com/questions/3265081/why-these-integrals-modified-from-integral-representation-of-bessel-function-are

DuckDuckGo is particularly "daring" - they find an exact match and instead
of showing a return page just drop you in the stackexchange post.

4. Yahoo:
https://search.yahoo.com/search?p=\frac{1}{2+\pi}+\int_{-\pi}
^{\pi}+e^{i(x+\sin+\tau-n+\tau)}+d+\tau

Statistical matches on latex strings, as expected for anyone familiar with
classic search engines. It's hard to read and the results are "bumpy". I
mentioned in our recent google doc that if you swap the order of scripts,
you land at *different* result pages. x_i^2 and x^2_i are not seen as the
same.

---

My main open question is a strategic one, the obvious one to ask really,
targeted at the search engine vendor ecosystem:

What is missing for Math Search to become "of interest" (i.e. supported by)
main browser vendors? Would that require a MathML-compatible standard query
language, to improve on latex search, image search, or CAS syntax search?
Or do we mostly need some sort of strategic recognition that math search is
a topic "worth investing in" beyond the baseline support for
application-specific plain text syntax?

There have indeed been a lot of academic stabs at this problem, and some
working systems. Annual search challenges at NTCIR and recently ARQMath
take place over MathML 3, both Presentation and Content. I don't think the
available markup has been a limiting factor on that end, at least to my
knowledge - not *yet* at least.

I would even venture to guess that we are posing the hard questions out of
order. Maybe (likely?) it is a prerequisite to get MathML Core rendering in
all major browsers, and have it reach a very wide positive reception,
before we can even start asking large vendors about what would attract them
for including Rich Results related to formula search.

So there is at least one perspective where the "search" focus may be better
revisited in a hypothetical MathML 5 down the road...

Greetings,
Deyan


On Thu, Aug 12, 2021 at 5:54 PM Neil Soiffer <soiffer@alum.mit.edu> wrote:

> At the meeting today, Deyan brought up that it is not possible to clearly
> write about potential solutions for searching math (something in our
> charter) when we haven't defined what we mean by 'math search'. We agreed
> that people would post opinions and/or links to papers that reflect their
> opinions.To start things off, here's my take....
>
> I think the most common math search is to search for some topic to learn
> about that topic. For example, someone might search for geometric
> progressions, Fibonacci numbers, or Bessel functions of the second kind.
> All of these types of searches seem well handled today because the text in
> documents that discuss them mention those words.
>
> The second kind of search is where I have an expression or equation and I
> want to know something about it. For example suppose I am reading something
> and it has the expression
> [image: image.png]
>
> Maybe some in this group recognize this, but I don't. I think this is
> where I would want to be able to do a search where I paste this in so I get
> results that help me to understand it, its properties, and its relationship
> to various areas.
>
> For this particular expression, I doubt many people would write it
> differently other than maybe using a different variable of integration or
> possibly using something other than 'x' (unlikely). However, in other
> searches, there are many equivalent expressions that might reverse the
> order of a sum or distribute a factor into a term. I believe dealing with
> all these issues (and many others) have been addressed in various research
> papers and specialized mathematical search engines.
>
> A quick search turns up the following math search engines:
> 1. wolframalpha.com
> 2. searchonmath.com
> 3. https://approach0.xyz/search/
> 4. https://www.google.com/imghp  (google image search)
>
> I would not normally classify Wolfram Alpha as a search engine, but it was
> the first hit on google for "math search engine". Also, I believe it is
> used by bing.
>
> Of these, only '3' returned a result that came back with the content
> related to the topic I grabbed the expression from. So clearly work remains
> to be done on math search engines. On the other hand, I'm not sure what
> there is for the WG to do. The input to these systems (TeX, calculator
> math, images) can all be derived from MathML (presentation or content).
>
> Interestingly, I tried using the TeX for this as part of a google search
> -- nothing very close on the first page. I also tried bing -- the first hit
> was close and the second hit was on topic.
>
> From this very limited experiment, my take is that the WG doesn't need to
> work on augmenting MathML for search (but search engines themselves need
> improvements). I'll leave it to those who have actually worked on search to
> contradict me and provide examples where augmenting presentation MathML
> would result in better search in the real world.
>
>     Neil
>
>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.
> www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#m_-7507260991899327240_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>

Received on Thursday, 12 August 2021 22:39:38 UTC