Structured Knowledge Platform Idea

I wanted to share some ideas I've been working on regarding reification, knowledge representation and integer-based knowledge representation.  I hope these ideas can be of some use for you.

Regards,
Adam Sobieski

---------------------------

FOUNDATION
Distributed Structured Knowledge Platform



Overview

This overview describes a theorized distributed P2P platform for language-independent structured knowledge and data exchange. This solution utilizes a numerical knowledge representation methodology for both computer efficiency and theorized language-independence. Additionally, this proposed knowledge representation format allows for distributed systems level optimization via the composition of packets during distributed query processing. Packets proposed in this model contain variables that are filled in by results during packet propagation across a P2P network completing a search tree pattern. Packets are routed by hashing the known quantities at each step utilizing the P2P content-addressible properties available by giving each machine a unique ID. Each successful leaf node in a search tree returns completed packets to a proxy machine that then returns results to the user's machine requesting a structured knowledge search. The proxy machine is used to allow multiple identical queries to be coordinated by a single machine (based upon its P2P machine ID and current time) so that a distributed caching feature can be realized. The use of system time in this delegation is so that the responsibility for popular queries can rotate across all machines in the network. That is, the proposed model offers one method to integrate P2P technology, distributed database technology and integer-based knowledge representation. Rule systems and ontology can be expressed alongside structured knowledge in the same format alongside it in the structured knowledgebase.

01 Knowledge Representation Format

This representation format utilizes 128-bit unsigned integers which are theoretically mappable to URI. This representation allows a style of reification on the triples called 'direct reification' which allows an element in a triple to uniquely reference another triple, though the considerations to descriptive logics and advanced ontological research are unknown. Additionally, each triple contains a key that allows it to be referenced uniquely making this proposed system quadruples based. The use of the Unique ID can be used for metadata purposes and source provenance in paraconsistent multisource knowledgebases. Pair-wise hashings can be obtained from the numerical elements of these quadruples to both provide redundancy and fault-tolerance, essential for P2P systems, and to allow content-addressible routing on the level of structured knowledge assertions. That is, given four 128-bit integers:

[Entity One][Relationship][Entity Two][Unique ID]

we can obtain a number of hashes including HASH(Entity One, Relationship), HASH(Relationship, Entity Two), HASH(Unique ID, Unique ID) with these hashes mapping to machines that each hold a duplicate of the structured knowledge assertion. This is of additional use to content-addressible routing during distributed query processing. Additionally possible in this numerical approach is to use the properties of certain bits to indicate logical negation (Entities resulting in a disjoint set, Relationships resulting in the logical negation, UniqueID's utilizable by a rule system. So, we can theorize that the left-most bit is utilized for this purpose. The rightmost two bits can be of additional use in distinguising which column an entity is introduced in (Entity One, Relationship, Entity Two, Unique ID). This can allow indirect reference across the distributed knowledgebase and possibly be of use to reasoning systems.

02 Distributed Cache

The use of a system time along with the unique hash of a query, also representable in a four-column notation, allows proxy machines to be determined that can rotate to delegate responsibility over a large number of machines. That is, something resembling:

QueryHash = HASH(MakeUnique(Query))
Offset = DetermineOffset(QueryHash)
Is CurrentTime() + Offset > NextTimeSliceBegin() ?
 If so, the proxy machine has ID = HASH(QueryHash, NextTimeSliceBegin())
 Otherwise, the proxy machine ID is HASH(QueryHash, LastTimeSliceBegin())

Here DetermineOffset() can use some sort of modular process on the QueryHash to spread the rotation of query proxies over a phase space. The function MakeUnique() can sort the lines of a query and generate the hash in some computationally efficient manner.

03 Numerical Subsystem

With an ontological commitment of a HASVALUE relationship type, we can implicitly know that the right-hand element, Entity Two, is numerical data and utilize standard methods on representing and distinguishing integer types and floating-point datatypes. That is, by distinguishing one relation type for system use, numerical data can be stored alongside structured knowledge data in the same integer-based format. Additionally possible is optimizing arithmetical and algebraic relationship types to more effectively utilize underlying hardware.

04 User Interface

An imagined user interface for direct structured knowledgebase access might resemble a graphical tabular interface, with four columns, where a user can enter text or drag and drop icons into position to articulate a desired query on the knowledgebase. Variables can be entered in the interface an matching tuples and structure returned from a structured knowledge query can be navigatible in the interface. Returned elements, sets and structure can, imaginably, be graphically utilized with accompanying workspaces and folders, where elements can be taken from to formulate a structured knowledge query. An interface would aim to minimize the text typing required to utilize the system by allowing entities, relationships and unique ID's to be visually representable.

05 Self-Optimizing Ontology

The numerical based approach allows, theoretically, machine-generated ontology so that, along with a rule system (also capable of machine-generation) entities can be stored efficiently. That is, theoretical relationships between ontology and file compression are possible with a knowledgebase handling its own storage in a manner that is independent from the processing of queries across the distributed system. This correlates to relationships between the AI subfields of decision trees and categorization and more efficient, possibly unintuitive, ontological compression can be written into the knowledgebase by algorithms. That is, algorithms can write ontology and corresponding rules into a knowledgebase for purposes of compression.

06 Conclusion

Described is one possible model for a complete all-to-all asynchronous parallel solution for writing, efficiently storing and retrieving a large volume of knowledge with a high rate of input and output. The knowledge representation format allows rules and ontology to be articulated alongside structured knowledge; the use of integers may introduce exciting possiblities into the fields of AI, NLP and machine translation.

Received on Friday, 9 February 2007 20:45:27 UTC