W3C home > Mailing lists > Public > semantic-web@w3.org > February 2007

Re: SQL queries and Structured Knowledge Platform Idea

From: Golda Velez <w3@webglimpse.org>
Date: Sat, 10 Feb 2007 02:59:56 -0700
To: semantic-web@w3.org
Cc: Golda Velez <w3@webglimpse.org>
Message-Id: <200702100259.57454.w3@webglimpse.org>

Quick caveat - of course its possible to do this kind of menu hierarchy with 
an unstructured category code.  But when wanting to do something like find 
all items belonging to any subcategory of a given category, having a 
structured code allows you to do that in a single SQL query:

select [table specs] where LEFT(itemcode,$lvl) = LEFT(catcode,$lvl)

where itemcode and catcode are both mysql fields containing structured 64-byte 
codes and $lvl is a variable referring to what depth our reference category 
is at.

To do the same query without a structured code requires a whole series of 
queries to traverse the subcat-tree.


On Saturday 10 February 2007 02:32, Golda Velez wrote:
> Interesting - I've been doing something a bit like this though not using 
> Having a compact structured numerical category code is extremely useful for 
> efficient manipulation of data by subcategory tree.
> Take a look at the drop down menus under
> http://btucson.com/Tucson/Business%20Directory/
> by hovering over the ยป character, you can quickly traverse over 8000 
> categories even though the initial footprint of a page may be as small as 
> 10K.
> Mean to do something similar for the DMOZ categories.  I think will be a 
> useful tool for helping users efficiently enter data & search under a 
> category.
> --Golda
> On Friday 09 February 2007 19:50, Adam wrote:
> > If there's any interesting developments based on these preliminary ideas, 
> I'll keep you posted.  I have a good feeling about this representation 
> format.
> > 
> > Cheers,
> > Adam Sobieski
> >   ----- Original Message ----- 
> >   From: Adam 
> >   To: semantic-web@w3.org 
> >   Sent: Friday, February 09, 2007 3:44 PM
> >   Subject: Structured Knowledge Platform Idea
> > 
> > 
> >   I wanted to share some ideas I've been working on regarding reification, 
> knowledge representation and integer-based knowledge representation.  I hope 
> these ideas can be of some use for you.
> > 
> >   Regards,
> >   Adam Sobieski
> > 
> >   ---------------------------
> > 
> >   Distributed Structured Knowledge Platform
> > 
> > 
> > 
> >   Overview
> > 
> >   This overview describes a theorized distributed P2P platform for 
> language-independent structured knowledge and data exchange. This solution 
> utilizes a numerical knowledge representation methodology for both computer 
> efficiency and theorized language-independence. Additionally, this proposed 
> knowledge representation format allows for distributed systems level 
> optimization via the composition of packets during distributed query 
> processing. Packets proposed in this model contain variables that are filled 
> in by results during packet propagation across a P2P network completing a 
> search tree pattern. Packets are routed by hashing the known quantities at 
> each step utilizing the P2P content-addressible properties available by 
> giving each machine a unique ID. Each successful leaf node in a search tree 
> returns completed packets to a proxy machine that then returns results to 
> user's machine requesting a structured knowledge search. The proxy machine 
> used to allow multiple identical queries to be coordinated by a single 
> machine (based upon its P2P machine ID and current time) so that a 
> distributed caching feature can be realized. The use of system time in this 
> delegation is so that the responsibility for popular queries can rotate 
> across all machines in the network. That is, the proposed model offers one 
> method to integrate P2P technology, distributed database technology and 
> integer-based knowledge representation. Rule systems and ontology can be 
> expressed alongside structured knowledge in the same format alongside it in 
> the structured knowledgebase.
> > 
> >   01 Knowledge Representation Format
> > 
> >   This representation format utilizes 128-bit unsigned integers which are 
> theoretically mappable to URI. This representation allows a style of 
> reification on the triples called 'direct reification' which allows an 
> element in a triple to uniquely reference another triple, though the 
> considerations to descriptive logics and advanced ontological research are 
> unknown. Additionally, each triple contains a key that allows it to be 
> referenced uniquely making this proposed system quadruples based. The use of 
> the Unique ID can be used for metadata purposes and source provenance in 
> paraconsistent multisource knowledgebases. Pair-wise hashings can be 
> from the numerical elements of these quadruples to both provide redundancy 
> and fault-tolerance, essential for P2P systems, and to allow 
> content-addressible routing on the level of structured knowledge assertions. 
> That is, given four 128-bit integers:
> > 
> >   [Entity One][Relationship][Entity Two][Unique ID]
> > 
> >   we can obtain a number of hashes including HASH(Entity One, 
> HASH(Relationship, Entity Two), HASH(Unique ID, Unique ID) with these hashes 
> mapping to machines that each hold a duplicate of the structured knowledge 
> assertion. This is of additional use to content-addressible routing during 
> distributed query processing. Additionally possible in this numerical 
> approach is to use the properties of certain bits to indicate logical 
> negation (Entities resulting in a disjoint set, Relationships resulting in 
> the logical negation, UniqueID's utilizable by a rule system. So, we can 
> theorize that the left-most bit is utilized for this purpose. The rightmost 
> two bits can be of additional use in distinguising which column an entity is 
> introduced in (Entity One, Relationship, Entity Two, Unique ID). This can 
> allow indirect reference across the distributed knowledgebase and possibly 
> of use to reasoning systems.
> > 
> >   02 Distributed Cache
> > 
> >   The use of a system time along with the unique hash of a query, also 
> representable in a four-column notation, allows proxy machines to be 
> determined that can rotate to delegate responsibility over a large number of 
> machines. That is, something resembling:
> > 
> >   QueryHash = HASH(MakeUnique(Query))
> >   Offset = DetermineOffset(QueryHash)
> >   Is CurrentTime() + Offset > NextTimeSliceBegin() ?
> >    If so, the proxy machine has ID = HASH(QueryHash, NextTimeSliceBegin())
> >    Otherwise, the proxy machine ID is HASH(QueryHash, 
> > 
> >   Here DetermineOffset() can use some sort of modular process on the 
> QueryHash to spread the rotation of query proxies over a phase space. The 
> function MakeUnique() can sort the lines of a query and generate the hash in 
> some computationally efficient manner.
> > 
> >   03 Numerical Subsystem
> > 
> >   With an ontological commitment of a HASVALUE relationship type, we can 
> implicitly know that the right-hand element, Entity Two, is numerical data 
> and utilize standard methods on representing and distinguishing integer 
> and floating-point datatypes. That is, by distinguishing one relation type 
> for system use, numerical data can be stored alongside structured knowledge 
> data in the same integer-based format. Additionally possible is optimizing 
> arithmetical and algebraic relationship types to more effectively utilize 
> underlying hardware.
> > 
> >   04 User Interface
> > 
> >   An imagined user interface for direct structured knowledgebase access 
> might resemble a graphical tabular interface, with four columns, where a 
> can enter text or drag and drop icons into position to articulate a desired 
> query on the knowledgebase. Variables can be entered in the interface an 
> matching tuples and structure returned from a structured knowledge query can 
> be navigatible in the interface. Returned elements, sets and structure can, 
> imaginably, be graphically utilized with accompanying workspaces and 
> where elements can be taken from to formulate a structured knowledge query. 
> An interface would aim to minimize the text typing required to utilize the 
> system by allowing entities, relationships and unique ID's to be visually 
> representable.
> > 
> >   05 Self-Optimizing Ontology
> > 
> >   The numerical based approach allows, theoretically, machine-generated 
> ontology so that, along with a rule system (also capable of 
> machine-generation) entities can be stored efficiently. That is, theoretical 
> relationships between ontology and file compression are possible with a 
> knowledgebase handling its own storage in a manner that is independent from 
> the processing of queries across the distributed system. This correlates to 
> relationships between the AI subfields of decision trees and categorization 
> and more efficient, possibly unintuitive, ontological compression can be 
> written into the knowledgebase by algorithms. That is, algorithms can write 
> ontology and corresponding rules into a knowledgebase for purposes of 
> compression.
> > 
> >   06 Conclusion
> > 
> >   Described is one possible model for a complete all-to-all asynchronous 
> parallel solution for writing, efficiently storing and retrieving a large 
> volume of knowledge with a high rate of input and output. The knowledge 
> representation format allows rules and ontology to be articulated alongside 
> structured knowledge; the use of integers may introduce exciting 
> into the fields of AI, NLP and machine translation.
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Golda Velez		http://goldavelez.info
> Webglimpse.Net		http://webglimpse.net
> Internet WorkShop	http://iwhome.com
> 	cell: (520) 440-1420
> "Help organize the world - index your own corner of the web!"

Golda Velez		http://goldavelez.info
Webglimpse.Net		http://webglimpse.net
Internet WorkShop	http://iwhome.com
	cell: (520) 440-1420
"Help organize the world - index your own corner of the web!"
Received on Saturday, 10 February 2007 08:49:21 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:44:59 UTC