Re: Two challenges related to KR of scientific papers and books

Nice practical problems Stan-
I like it
some would argue that latex is a representation, however most would agree
it is machine oriented=

I see the machine representation and the human language representation
should be aligned, to support understandability and explainability
This is what we can try to advance here -

The problem I see- correct me if I am wrong - is precision formulas can be
exact, natural language requires refinement to be exact
I also think that we can be exact with NL- we would have to prove it tho
NL probably requires more memory space to be stored, and maybe more
processing power to be computed but who cares? we have an abundance of
those- its small price to be paid

Could you please come up with one or two examples of natural language
representation for each case for just some proof of principle, we can then
discuss if it is useful and scope further work.  I ll be happy with a proof
of principle to start with,

On Mon, Nov 21, 2022 at 9:01 AM Stanislav Srednyak, Ph.D. <
stanislav.srednyak@duke.edu> wrote:

> Dear colleagues,
>
> thanks for the discussion that we had last time. I did some more study of
> the material on the KR group and I thing the following two questions would
> resonate with what many people here were thinking about.
>
> 1) building representation for latex formulas..
>
> There is an amazing data set at arxiv. More generally, there are many pdfs
> of scientific articles available for modeling.. This data set is
> challenging for several reasons.
>
> 1. it is hard to parse formulas.
> 2. there are omitted calculations
>
> The problem 1. is the one I would like to attract attention to. It may be
> amenable to definite analysis because the set of math objects that humas
> use is very restrictive. In fact, most of the papers are just bout discrete
> math, number theory, and functions of real variables. Very seldom people
> use higher functionals. Thus, although the ZFC axiomatics admits
> arbitrarily complicated objects, what is actually found in papers is on the
> first several floors of Godel's definable universe.
>
> There is an effort to revolutionize .pdf format here desci.com
>
> 2) AST of code.
>
>
> There is a lot of code on Git, and there are standard tools for some
> languages, e.g LLVM for c++, ast package for Go, etc. Unfortunately, the
> representations that result from these tools are ill suited for describing
> code. One would like a graphical tool to represent and manipulate the code.
>
> If such tools would be available, we would be able to match the parse
> trees of accompanying English with the trees from AST.
>
> The tools like Transformer with its multihead attention would be much more
> useful.
>
>
> Stan Srednyak
>
>
>

Received on Monday, 21 November 2022 02:42:44 UTC