Re: Structured Knowledge Platform Idea from Golda Velez on 2007-02-10 (semantic-web@w3.org from February 2007)

From: Golda Velez <w3@webglimpse.org>
Date: Sat, 10 Feb 2007 02:30:13 -0700
To: semantic-web@w3.org
Cc: w3@webglimpse.org
Message-Id: <200702100230.14511.w3@webglimpse.org>
Interesting - I've been doing something a bit like this though not using RDF. 
Having a compact structired numerical category code is extremely useful for 
efficient manipulation of data by subcategory tree.

Take a look at the drop down menus under

http://btucson.com/Tucson/Business%20Directory/

by hovering over the » character, you can quickly traverse over 8000 
categories even though the initial footprint of a page may be as small as 
10K.

Mean to do something similar for the DMOZ categories.  I think will be a 
useful tool for helping users efficiently enter data & search under a precise 
category.

--Golda
 

On Friday 09 February 2007 19:50, Adam wrote:
> If there's any interesting developments based on these preliminary ideas, 
I'll keep you posted.  I have a good feeling about this representation 
format.
> 
> Cheers,
> Adam Sobieski
>   ----- Original Message ----- 
>   From: Adam 
>   To: semantic-web@w3.org 
>   Sent: Friday, February 09, 2007 3:44 PM
>   Subject: Structured Knowledge Platform Idea
> 
> 
>   I wanted to share some ideas I've been working on regarding reification, 
knowledge representation and integer-based knowledge representation.  I hope 
these ideas can be of some use for you.
> 
>   Regards,
>   Adam Sobieski
> 
>   ---------------------------
> 
>   FOUNDATION
>   Distributed Structured Knowledge Platform
> 
> 
> 
>   Overview
> 
>   This overview describes a theorized distributed P2P platform for 
language-independent structured knowledge and data exchange. This solution 
utilizes a numerical knowledge representation methodology for both computer 
efficiency and theorized language-independence. Additionally, this proposed 
knowledge representation format allows for distributed systems level 
optimization via the composition of packets during distributed query 
processing. Packets proposed in this model contain variables that are filled 
in by results during packet propagation across a P2P network completing a 
search tree pattern. Packets are routed by hashing the known quantities at 
each step utilizing the P2P content-addressible properties available by 
giving each machine a unique ID. Each successful leaf node in a search tree 
returns completed packets to a proxy machine that then returns results to the 
user's machine requesting a structured knowledge search. The proxy machine is 
used to allow multiple identical queries to be coordinated by a single 
machine (based upon its P2P machine ID and current time) so that a 
distributed caching feature can be realized. The use of system time in this 
delegation is so that the responsibility for popular queries can rotate 
across all machines in the network. That is, the proposed model offers one 
method to integrate P2P technology, distributed database technology and 
integer-based knowledge representation. Rule systems and ontology can be 
expressed alongside structured knowledge in the same format alongside it in 
the structured knowledgebase.
> 
>   01 Knowledge Representation Format
> 
>   This representation format utilizes 128-bit unsigned integers which are 
theoretically mappable to URI. This representation allows a style of 
reification on the triples called 'direct reification' which allows an 
element in a triple to uniquely reference another triple, though the 
considerations to descriptive logics and advanced ontological research are 
unknown. Additionally, each triple contains a key that allows it to be 
referenced uniquely making this proposed system quadruples based. The use of 
the Unique ID can be used for metadata purposes and source provenance in 
paraconsistent multisource knowledgebases. Pair-wise hashings can be obtained 
from the numerical elements of these quadruples to both provide redundancy 
and fault-tolerance, essential for P2P systems, and to allow 
content-addressible routing on the level of structured knowledge assertions. 
That is, given four 128-bit integers:
> 
>   [Entity One][Relationship][Entity Two][Unique ID]
> 
>   we can obtain a number of hashes including HASH(Entity One, Relationship), 
HASH(Relationship, Entity Two), HASH(Unique ID, Unique ID) with these hashes 
mapping to machines that each hold a duplicate of the structured knowledge 
assertion. This is of additional use to content-addressible routing during 
distributed query processing. Additionally possible in this numerical 
approach is to use the properties of certain bits to indicate logical 
negation (Entities resulting in a disjoint set, Relationships resulting in 
the logical negation, UniqueID's utilizable by a rule system. So, we can 
theorize that the left-most bit is utilized for this purpose. The rightmost 
two bits can be of additional use in distinguising which column an entity is 
introduced in (Entity One, Relationship, Entity Two, Unique ID). This can 
allow indirect reference across the distributed knowledgebase and possibly be 
of use to reasoning systems.
> 
>   02 Distributed Cache
> 
>   The use of a system time along with the unique hash of a query, also 
representable in a four-column notation, allows proxy machines to be 
determined that can rotate to delegate responsibility over a large number of 
machines. That is, something resembling:
> 
>   QueryHash = HASH(MakeUnique(Query))
>   Offset = DetermineOffset(QueryHash)
>   Is CurrentTime() + Offset > NextTimeSliceBegin() ?
>    If so, the proxy machine has ID = HASH(QueryHash, NextTimeSliceBegin())
>    Otherwise, the proxy machine ID is HASH(QueryHash, LastTimeSliceBegin())
> 
>   Here DetermineOffset() can use some sort of modular process on the 
QueryHash to spread the rotation of query proxies over a phase space. The 
function MakeUnique() can sort the lines of a query and generate the hash in 
some computationally efficient manner.
> 
>   03 Numerical Subsystem
> 
>   With an ontological commitment of a HASVALUE relationship type, we can 
implicitly know that the right-hand element, Entity Two, is numerical data 
and utilize standard methods on representing and distinguishing integer types 
and floating-point datatypes. That is, by distinguishing one relation type 
for system use, numerical data can be stored alongside structured knowledge 
data in the same integer-based format. Additionally possible is optimizing 
arithmetical and algebraic relationship types to more effectively utilize 
underlying hardware.
> 
>   04 User Interface
> 
>   An imagined user interface for direct structured knowledgebase access 
might resemble a graphical tabular interface, with four columns, where a user 
can enter text or drag and drop icons into position to articulate a desired 
query on the knowledgebase. Variables can be entered in the interface an 
matching tuples and structure returned from a structured knowledge query can 
be navigatible in the interface. Returned elements, sets and structure can, 
imaginably, be graphically utilized with accompanying workspaces and folders, 
where elements can be taken from to formulate a structured knowledge query. 
An interface would aim to minimize the text typing required to utilize the 
system by allowing entities, relationships and unique ID's to be visually 
representable.
> 
>   05 Self-Optimizing Ontology
> 
>   The numerical based approach allows, theoretically, machine-generated 
ontology so that, along with a rule system (also capable of 
machine-generation) entities can be stored efficiently. That is, theoretical 
relationships between ontology and file compression are possible with a 
knowledgebase handling its own storage in a manner that is independent from 
the processing of queries across the distributed system. This correlates to 
relationships between the AI subfields of decision trees and categorization 
and more efficient, possibly unintuitive, ontological compression can be 
written into the knowledgebase by algorithms. That is, algorithms can write 
ontology and corresponding rules into a knowledgebase for purposes of 
compression.
> 
>   06 Conclusion
> 
>   Described is one possible model for a complete all-to-all asynchronous 
parallel solution for writing, efficiently storing and retrieving a large 
volume of knowledge with a high rate of input and output. The knowledge 
representation format allows rules and ontology to be articulated alongside 
structured knowledge; the use of integers may introduce exciting possiblities 
into the fields of AI, NLP and machine translation.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Golda Velez  http://goldavelez.info
Webglimpse.Net  http://webglimpse.net
Internet WorkShop http://iwhome.com
 cell: (520) 440-1420
"Help organize the world - index your own corner of the web!"
Received on Saturday, 10 February 2007 08:19:39 UTC