- From: Deyan Ginev <deyan.ginev@gmail.com>
- Date: Thu, 12 Aug 2021 18:37:55 -0400
- To: Neil Soiffer <soiffer@alum.mit.edu>
- Cc: "www-math@w3.org" <www-math@w3.org>
- Message-ID: <CANjPgh-QfQ8hcjo2MU7jPK6-wxn5jve8rCUg062LJGzO9ns9qA@mail.gmail.com>

Hi all, It's always great to start with the running example, thanks Neil for that. In the community I'm close to, the TeX for that formula ought to be: \frac{1}{2 \pi} \int_{-\pi}^{\pi} e^{i(x \sin \tau-n \tau)} d \tau And that's what people would have as a starting point. I got it via OCR-ing with MathPix. A little known (relatively new) feature of that one application is their own limited math search, I believe only from a small selection of education-related corpora (Wikipedia, StackExchange, etc), I believe based around some in-house algorithm related to the LaTeX format (and not MathML). This is not an endorsement of MathPix, but they do have my attention in recent days. Just as some rough example of what a working math search ought to return back to a user: [image: image.png] Meanwhile, here are the result pages - when querying for this latex - from a few major vendors: 1. Google: DuckDuckGo and Yahoo: https://www.google.com/search?q=\frac{1}{2+\pi}+\int_{-\pi} ^{\pi}+e^{i(x+\sin+\tau-n+\tau)}+d+\tau 2. Bing: https://www.bing.com/search?q=%5Cfrac%7B1%7D%7B2+%5Cpi%7D+%5Cint_%7B-%5Cpi%7D%5E%7B%5Cpi%7D+e%5E%7Bi%28x+%5Csin+%5Ctau-n+%5Ctau%29%7D+d+%5Ctau 3. DuckDuckGo: https://math.stackexchange.com/questions/3265081/why-these-integrals-modified-from-integral-representation-of-bessel-function-are DuckDuckGo is particularly "daring" - they find an exact match and instead of showing a return page just drop you in the stackexchange post. 4. Yahoo: https://search.yahoo.com/search?p=\frac{1}{2+\pi}+\int_{-\pi} ^{\pi}+e^{i(x+\sin+\tau-n+\tau)}+d+\tau Statistical matches on latex strings, as expected for anyone familiar with classic search engines. It's hard to read and the results are "bumpy". I mentioned in our recent google doc that if you swap the order of scripts, you land at *different* result pages. x_i^2 and x^2_i are not seen as the same. --- My main open question is a strategic one, the obvious one to ask really, targeted at the search engine vendor ecosystem: What is missing for Math Search to become "of interest" (i.e. supported by) main browser vendors? Would that require a MathML-compatible standard query language, to improve on latex search, image search, or CAS syntax search? Or do we mostly need some sort of strategic recognition that math search is a topic "worth investing in" beyond the baseline support for application-specific plain text syntax? There have indeed been a lot of academic stabs at this problem, and some working systems. Annual search challenges at NTCIR and recently ARQMath take place over MathML 3, both Presentation and Content. I don't think the available markup has been a limiting factor on that end, at least to my knowledge - not *yet* at least. I would even venture to guess that we are posing the hard questions out of order. Maybe (likely?) it is a prerequisite to get MathML Core rendering in all major browsers, and have it reach a very wide positive reception, before we can even start asking large vendors about what would attract them for including Rich Results related to formula search. So there is at least one perspective where the "search" focus may be better revisited in a hypothetical MathML 5 down the road... Greetings, Deyan On Thu, Aug 12, 2021 at 5:54 PM Neil Soiffer <soiffer@alum.mit.edu> wrote: > At the meeting today, Deyan brought up that it is not possible to clearly > write about potential solutions for searching math (something in our > charter) when we haven't defined what we mean by 'math search'. We agreed > that people would post opinions and/or links to papers that reflect their > opinions.To start things off, here's my take.... > > I think the most common math search is to search for some topic to learn > about that topic. For example, someone might search for geometric > progressions, Fibonacci numbers, or Bessel functions of the second kind. > All of these types of searches seem well handled today because the text in > documents that discuss them mention those words. > > The second kind of search is where I have an expression or equation and I > want to know something about it. For example suppose I am reading something > and it has the expression > [image: image.png] > > Maybe some in this group recognize this, but I don't. I think this is > where I would want to be able to do a search where I paste this in so I get > results that help me to understand it, its properties, and its relationship > to various areas. > > For this particular expression, I doubt many people would write it > differently other than maybe using a different variable of integration or > possibly using something other than 'x' (unlikely). However, in other > searches, there are many equivalent expressions that might reverse the > order of a sum or distribute a factor into a term. I believe dealing with > all these issues (and many others) have been addressed in various research > papers and specialized mathematical search engines. > > A quick search turns up the following math search engines: > 1. wolframalpha.com > 2. searchonmath.com > 3. https://approach0.xyz/search/ > 4. https://www.google.com/imghp (google image search) > > I would not normally classify Wolfram Alpha as a search engine, but it was > the first hit on google for "math search engine". Also, I believe it is > used by bing. > > Of these, only '3' returned a result that came back with the content > related to the topic I grabbed the expression from. So clearly work remains > to be done on math search engines. On the other hand, I'm not sure what > there is for the WG to do. The input to these systems (TeX, calculator > math, images) can all be derived from MathML (presentation or content). > > Interestingly, I tried using the TeX for this as part of a google search > -- nothing very close on the first page. I also tried bing -- the first hit > was close and the second hit was on topic. > > From this very limited experiment, my take is that the WG doesn't need to > work on augmenting MathML for search (but search engines themselves need > improvements). I'll leave it to those who have actually worked on search to > contradict me and provide examples where augmenting presentation MathML > would result in better search in the real world. > > Neil > > > > <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. > www.avg.com > <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > <#m_-7507260991899327240_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >

## Attachments

- image/png attachment: image.png

- image/png attachment: 02-image.png

Received on Thursday, 12 August 2021 22:39:38 UTC