Web search and Distributed Knowledge Structures

Dear colleagues,

I would like to inform AIKR CG on the work that we have been doing at
rorur.com, a decentralized search engine.

The benefits of creating such a search engine hardly need discussion within
this group. I believe all of you understand the necessity of computational
experimentation with web scale data. Unfortunately, there is no way for the
broad research community to participate in the analysis and modeling of the
data harvested by commercial search engines, such as Google. Their code is
closed source, there are no benchmarks, there is very little information on
the algorithms they are using. The search as they provide is commerce
centered, rather than knowledge centered.


There is a project that aims to collect web data for public good - Common
Crawl. Unfortunately , there are some structural problems with it. First,
it does not collect most of the data, rather than a sample of the web.
Second, by the nature of the experiments that one may do on the web data,
they are rather expensive, and there is no way to fund high quality
research based on community needs.

We addressed these problems using new incentive structures that emerged out
of blockchain technology, see our whitepaper.

The reason for this letter is the following. We are at the stage when we
can start collecting data on the web scale. Now the computation on this
data may become available in the near future. Therefore, we would like the
community to answer the following question:

Given web data, what kind  of distributed computations would be of interest?


Let me propose some examples of data structures that we may wish to
construct in an automated way as a result of such computation.

- distributed knowledge graph. This data structure contains information on
the distribution of entities among urls, relative position in the web
graph, context analysis based on proximity , etc

- distributed ranking system. This is a system which assigns to each page a
"ranking structure" : a function that maps a query string to a real number
, which computes the relevance of the page for this query.

- distributed ASTs. As an example , consider an area of mathematics, e.g.,
homotopy theory. As a data structure, it exists in a form of multiple
semantically connected papers and ursl. Distributed AST organizes this data
structure into a set of interconnected statements , theorems and HOL
relations between them ( in the sense of temporal logic).


Please give your example here, and don't hesitate to contact me if you have
any questions. We have a rather precise definition of "distributed
computation" that stems from limitations of our system, please consult the
website.


Stan Srednyak

Received on Monday, 14 November 2022 09:26:45 UTC