Re: Web search and Distributed Knowledge Structures

Stan, your "roadmap" for Rorur is now available in StratML Part 2, Performance Plan/Report, format at https://stratml.us/drybridge/index.htm#RRR
I'll look forward to learning if your efforts might be supportive of the StratML-enabled query service on which Naval, Pradeep, and I are working, as outlined in the plans at https://aboutthem.info/
At this point, distribution of the query services is not a priority for me.  However, as the vision of the StratML standard -- A worldwide web of intentions, stakeholders, and results -- begins to be realized, there will be the need/opportunity for myriad value-added capabilities serving individuals and specialized stakeholder groups.  Some of those tool, app, and service requirements are outlined at https://stratml.us/carmel/iso/SMLTASwStyle.xml

BTW, as a matter of information management maturity, I encourage you to consider prospects for leveraging the semantics and structure of valid XML instance documents rather than focusing initially, if not exclusively on HTML walls of text.  See https://en.wikipedia.org/wiki/Machine-readable_document
There are >5K plans in the StratML collection that can be used for demonstration purposes and all of their URLs are listed in sitemap format at https://stratml.us/sitemap.xml

Owen Amburhttps://www.linkedin.com/in/owenambur/
 

    On Friday, November 11, 2022 at 11:01:53 PM EST, Stanislav Srednyak, Ph.D. <stanislav.srednyak@duke.edu> wrote:  
 
  Dear colleagues,
I would like to inform AIKR WG on the work that we have been doing at rorur.com, a decentralized search engine.
The benefits of creating such a search engine hardly need discussion within this group. I believe all of you understand the necessity of computational experimentation with web scale data. Unfortunately, there is no way for the broad research community to participate in the analysis and modeling of the data harvested by commercial search engines, such as Google. Their code is a closed source, there are no benchmarks, there is very little information on the algorithms they are using. The search as they provide is commerce centered, rather than knowledge centered.

There is a project that aims to collect web data for public good - Common Crawl. Unfortunately , there are some structural problems with it. First, it it does not collect most of the data, but rather a sample of the web. Second, by the nature of the experiments that one may do on the web data, they are rather expensive, and there is no way to fund high quality research based on community needs.
We addressed these problems using new incentive structures that emerged out of blockchain technology, see our whitepaper..
The reason for this letter is the following. We are at the stage when we can start collecting data on the web scale. Now the computation on this data may become available in the near future. Therefore, we would like the community to answer the following question:
Given web data, what kind of distributed computations would be of interest?

Let me propose some examples of data structures that we may wish to construct in an automated way as a result of such computation.
- distributed knowledge graph. This data structure contains information on the distribution of entities among urls, relative position in the web graph, context analysis based on proximity , etc
- distributed ranking system. This is a system which assigns to each page a "ranking structure" : a function that maps a query string to a real number , which computes the relevance of the page for this query.
- distributed ASTs. As an example , consider an area of mathematics, e.g., homotopy theory. As a data structure, it exists in a form of multiple semantically connected papers and ursl. Distributed AST organizes this data structure into a set of interconnected statements , theorems and HOL relations between them ( in the sense of temporal logic).

Please give your example here, and don't hesitate to contact me if you have any questions. We have a rather precise definition of "distributed computation" that stems from the limitations of our system, please consult the website.

Stan Srednyak

  

Received on Saturday, 12 November 2022 17:24:57 UTC