Re: searching for math

Hi Paul, all,

This is a follow-up from our group discussion, trying to stay in the
search-related email thread.

Google has some supported uses of schema.org that touch upon mathematics,
in very specific, indeed webapp-specific, ways. Some examples related to
formulas:

1. "quadratic equation", has the google rich page:
https://g.co/kgs/Ahtp25

with a bunch of "rich results" tabs on the left, and some rich previews on
the right sidebar - assuming you have a wide screen.

I already knew practice problems were carried into the results through
schema.org, so I clicked on that tab on the left, and here is a screenshot
of what I saw:
[image: byju.png]


I have no prior knowledge of "byju" and can't state any opinion on it, but
it was the default multiple-choice vendor that showed up.

So I went to the HTML offered by "more practice on byju", opened the
inspector and extracted the following LD+JSON encoding of schema.org
structured data.
I republished it as a gist for convenience, since using the inspector is
visually painful.

There are several separate rich data snippets, each for a different purpose
- top-level, video, and the quiz. The link to the  The link to the quiz
LD+JSON is:
https://gist.github.com/dginev/33b5d0b7262fc70d9619e4af8791e3fc#file-quiz-json

it's anchored to that term via the very opaque metadata:
"name": "Quiz about Quadratic equations",
   "about": {
      "@type": "Thing",
      "name": "Quadratic equations"
   }, ...

I say opaque, as the name is not connected to some sophisticated ontology,
it's just Thing. I tried using "name" for the "Ackermann-function" over
MathML when I made my demo, but that was not the expected use, so it wasn't
picked up. The key for it being used here, I suspect, is a level up - the
parent key "educationalAlignment", defined here:
https://schema.org/educationalAlignment

So, my long-term future hope is, as long as you have a search vendor trying
to help us build our application inside schema.org, one can imagine a
"mathIntent" and/or "mathSearch" and/or "mathAlignment" etc. newly
introduced vocabulary entry, under which we can instantiate whatever
application logic is considered viable between the WG and the search
vendors.

There are a bunch of other mini "apps" (or contexts?) you can see as
subresults in the left tab. Here are some other top-level links to other
concepts I've seen with such results, I'm sure there are more:

2. Schrödinger equation (see the Formula tab)
https://g.co/kgs/DwSMiW

3. Integration by parts (see e.g. the Formula, practice problems, tabs)
https://g.co/kgs/VwvHhH

4. ceva's theorem (see e.g. practice problems)
https://g.co/kgs/frM2Wy

These do not use MathML, and the formulas I've seen are presented via
images.

Your mileage may vary, and I am not endorsing or rebuffing the "Rich
Results" pages themselves - I am very deliberately trying not to discuss
them here. The key point is that the LD+JSON structured data, deposited by
third-parties and later crawled by google, is actually used here -- for
this specific flavour of applications.

If anyone else has more context and information about rich results, please
feel most welcome to reply with additional details and thoughts about their
possible connections to MathML expressions.

Greetings,
Deyan


On Tue, Aug 17, 2021 at 4:32 PM Neil Soiffer <soiffer@alum.mit.edu> wrote:

> Seems like we have reached the same conclusion that now is not the time to
> try and enhance search in math. We will discuss this further at the meeting
> this week, but I encourage anyone who disagrees with that feeling to put
> their thoughts down in email or a position paper and post that well in
> advance of the meeting so we have time to digest it.
>
>     Neil
>
>
> On Thu, Aug 12, 2021 at 3:38 PM Deyan Ginev <deyan.ginev@gmail.com> wrote:
>
>> Hi all,
>>
>> It's always great to start with the running example, thanks Neil for
>> that. In the community I'm close to, the TeX for that formula ought to be:
>>
>> \frac{1}{2 \pi} \int_{-\pi}^{\pi} e^{i(x \sin \tau-n \tau)} d \tau
>>
>> And that's what people would have as a starting point. I got it via
>> OCR-ing with MathPix.
>>
>> A little known (relatively new) feature of that one application is their
>> own limited math search, I believe only from a small selection of
>> education-related corpora (Wikipedia, StackExchange, etc), I believe based
>> around some in-house algorithm related to the LaTeX format (and not
>> MathML).
>>
>> This is not an endorsement of MathPix, but they do have my attention in
>> recent days. Just as some rough example of what a working math search ought
>> to return back to a user:
>>
>> [image: image.png]
>>
>> Meanwhile, here are the result pages - when querying for this latex -
>> from a few major vendors:
>>
>> 1. Google:   DuckDuckGo and Yahoo:
>> https://www.google.com/search?q=\frac{1}{2+\pi}+\int_{-\pi}
>> <https://www.google.com/search?q=%5Cfrac%7B1%7D%7B2+%5Cpi%7D+%5Cint_%7B-%5Cpi%7D>
>> ^{\pi}+e^{i(x+\sin+\tau-n+\tau)}+d+\tau
>>
>> 2. Bing:
>>
>> https://www.bing.com/search?q=%5Cfrac%7B1%7D%7B2+%5Cpi%7D+%5Cint_%7B-%5Cpi%7D%5E%7B%5Cpi%7D+e%5E%7Bi%28x+%5Csin+%5Ctau-n+%5Ctau%29%7D+d+%5Ctau
>>
>> 3. DuckDuckGo:
>>
>> https://math.stackexchange.com/questions/3265081/why-these-integrals-modified-from-integral-representation-of-bessel-function-are
>>
>> DuckDuckGo is particularly "daring" - they find an exact match and
>> instead of showing a return page just drop you in the stackexchange post.
>>
>> 4. Yahoo:
>> https://search.yahoo.com/search?p=\frac{1}{2+\pi}+\int_{-\pi}
>> <https://search.yahoo.com/search?p=%5Cfrac%7B1%7D%7B2+%5Cpi%7D+%5Cint_%7B-%5Cpi%7D>
>> ^{\pi}+e^{i(x+\sin+\tau-n+\tau)}+d+\tau
>>
>> Statistical matches on latex strings, as expected for anyone familiar
>> with classic search engines. It's hard to read and the results are "bumpy".
>> I mentioned in our recent google doc that if you swap the order of scripts,
>> you land at *different* result pages. x_i^2 and x^2_i are not seen as the
>> same.
>>
>> ---
>>
>> My main open question is a strategic one, the obvious one to ask really,
>> targeted at the search engine vendor ecosystem:
>>
>> What is missing for Math Search to become "of interest" (i.e. supported
>> by) main browser vendors? Would that require a MathML-compatible standard
>> query language, to improve on latex search, image search, or CAS syntax
>> search? Or do we mostly need some sort of strategic recognition that math
>> search is a topic "worth investing in" beyond the baseline support for
>> application-specific plain text syntax?
>>
>> There have indeed been a lot of academic stabs at this problem, and some
>> working systems. Annual search challenges at NTCIR and recently ARQMath
>> take place over MathML 3, both Presentation and Content. I don't think the
>> available markup has been a limiting factor on that end, at least to my
>> knowledge - not *yet* at least.
>>
>> I would even venture to guess that we are posing the hard questions out
>> of order. Maybe (likely?) it is a prerequisite to get MathML Core rendering
>> in all major browsers, and have it reach a very wide positive reception,
>> before we can even start asking large vendors about what would attract them
>> for including Rich Results related to formula search.
>>
>> So there is at least one perspective where the "search" focus may be
>> better revisited in a hypothetical MathML 5 down the road...
>>
>> Greetings,
>> Deyan
>>
>>
>> On Thu, Aug 12, 2021 at 5:54 PM Neil Soiffer <soiffer@alum.mit.edu>
>> wrote:
>>
>>> At the meeting today, Deyan brought up that it is not possible to
>>> clearly write about potential solutions for searching math (something in
>>> our charter) when we haven't defined what we mean by 'math search'. We
>>> agreed that people would post opinions and/or links to papers that reflect
>>> their opinions.To start things off, here's my take....
>>>
>>> I think the most common math search is to search for some topic to learn
>>> about that topic. For example, someone might search for geometric
>>> progressions, Fibonacci numbers, or Bessel functions of the second kind.
>>> All of these types of searches seem well handled today because the text in
>>> documents that discuss them mention those words.
>>>
>>> The second kind of search is where I have an expression or equation and
>>> I want to know something about it. For example suppose I am reading
>>> something and it has the expression
>>> [image: image.png]
>>>
>>> Maybe some in this group recognize this, but I don't. I think this is
>>> where I would want to be able to do a search where I paste this in so I get
>>> results that help me to understand it, its properties, and its relationship
>>> to various areas.
>>>
>>> For this particular expression, I doubt many people would write it
>>> differently other than maybe using a different variable of integration or
>>> possibly using something other than 'x' (unlikely). However, in other
>>> searches, there are many equivalent expressions that might reverse the
>>> order of a sum or distribute a factor into a term. I believe dealing with
>>> all these issues (and many others) have been addressed in various research
>>> papers and specialized mathematical search engines.
>>>
>>> A quick search turns up the following math search engines:
>>> 1. wolframalpha.com
>>> 2. searchonmath.com
>>> 3. https://approach0.xyz/search/
>>> 4. https://www.google.com/imghp  (google image search)
>>>
>>> I would not normally classify Wolfram Alpha as a search engine, but it
>>> was the first hit on google for "math search engine". Also, I believe it is
>>> used by bing.
>>>
>>> Of these, only '3' returned a result that came back with the content
>>> related to the topic I grabbed the expression from. So clearly work remains
>>> to be done on math search engines. On the other hand, I'm not sure what
>>> there is for the WG to do. The input to these systems (TeX, calculator
>>> math, images) can all be derived from MathML (presentation or content).
>>>
>>> Interestingly, I tried using the TeX for this as part of a google search
>>> -- nothing very close on the first page. I also tried bing -- the first hit
>>> was close and the second hit was on topic.
>>>
>>> From this very limited experiment, my take is that the WG doesn't need
>>> to work on augmenting MathML for search (but search engines themselves need
>>> improvements). I'll leave it to those who have actually worked on search to
>>> contradict me and provide examples where augmenting presentation MathML
>>> would result in better search in the real world.
>>>
>>>     Neil
>>>
>>>
>>>
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.
>>> www.avg.com
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>> <#m_-4075487521858841656_m_-1245278739005980621_m_-7507260991899327240_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>
>>

Received on Thursday, 19 August 2021 19:05:50 UTC