Web search and Distributed Knowledge Structures

Dear colleagues,

I would like to inform AIKR WG on the work that we have been doing at rorur..com, a decentralized search engine.

The benefits of creating such a search engine hardly need discussion within this group. I believe all of you understand the necessity of computational experimentation with web scale data. Unfortunately, there is no way for the broad research community to participate in the analysis and modeling of the data harvested by commercial search engines, such as Google. Their code is a closed source, there are no benchmarks, there is very little information on the algorithms they are using. The search as they provide is commerce centered, rather than knowledge centered.


There is a project that aims to collect web data for public good - Common Crawl. Unfortunately , there are some structural problems with it. First, it it does not collect most of the data, but rather a sample of the web. Second, by the nature of the experiments that one may do on the web data, they are rather expensive, and there is no way to fund high quality research based on community needs.

We addressed these problems using new incentive structures that emerged out of blockchain technology, see our whitepaper.

The reason for this letter is the following. We are at the stage when we can start collecting data on the web scale. Now the computation on this data may become available in the near future. Therefore, we would like the community to answer the following question:

Given web data, what kind of distributed computations would be of interest?


Let me propose some examples of data structures that we may wish to construct in an automated way as a result of such computation.

- distributed knowledge graph. This data structure contains information on the distribution of entities among urls, relative position in the web graph, context analysis based on proximity , etc

- distributed ranking system. This is a system which assigns to each page a "ranking structure" : a function that maps a query string to a real number , which computes the relevance of the page for this query.

- distributed ASTs. As an example , consider an area of mathematics, e.g., homotopy theory. As a data structure, it exists in a form of multiple semantically connected papers and ursl. Distributed AST organizes this data structure into a set of interconnected statements , theorems and HOL relations between them ( in the sense of temporal logic).


Please give your example here, and don't hesitate to contact me if you have any questions. We have a rather precise definition of "distributed computation" that stems from the limitations of our system, please consult the website.


Stan Srednyak

Received on Saturday, 12 November 2022 04:01:10 UTC