- From: Golda Velez <w3@webglimpse.org>
- Date: Sat, 10 Feb 2007 02:59:56 -0700
- To: semantic-web@w3.org
- Cc: Golda Velez <w3@webglimpse.org>
Quick caveat - of course its possible to do this kind of menu hierarchy with an unstructured category code. But when wanting to do something like find all items belonging to any subcategory of a given category, having a structured code allows you to do that in a single SQL query: select [table specs] where LEFT(itemcode,$lvl) = LEFT(catcode,$lvl) where itemcode and catcode are both mysql fields containing structured 64-byte codes and $lvl is a variable referring to what depth our reference category is at. To do the same query without a structured code requires a whole series of queries to traverse the subcat-tree. --G On Saturday 10 February 2007 02:32, Golda Velez wrote: > > Interesting - I've been doing something a bit like this though not using RDF. > Having a compact structured numerical category code is extremely useful for > efficient manipulation of data by subcategory tree. > > Take a look at the drop down menus under > > http://btucson.com/Tucson/Business%20Directory/ > > by hovering over the ยป character, you can quickly traverse over 8000 > categories even though the initial footprint of a page may be as small as > 10K. > > Mean to do something similar for the DMOZ categories. I think will be a > useful tool for helping users efficiently enter data & search under a precise > category. > > --Golda > > > On Friday 09 February 2007 19:50, Adam wrote: > > If there's any interesting developments based on these preliminary ideas, > I'll keep you posted. I have a good feeling about this representation > format. > > > > Cheers, > > Adam Sobieski > > ----- Original Message ----- > > From: Adam > > To: semantic-web@w3.org > > Sent: Friday, February 09, 2007 3:44 PM > > Subject: Structured Knowledge Platform Idea > > > > > > I wanted to share some ideas I've been working on regarding reification, > knowledge representation and integer-based knowledge representation. I hope > these ideas can be of some use for you. > > > > Regards, > > Adam Sobieski > > > > --------------------------- > > > > FOUNDATION > > Distributed Structured Knowledge Platform > > > > > > > > Overview > > > > This overview describes a theorized distributed P2P platform for > language-independent structured knowledge and data exchange. This solution > utilizes a numerical knowledge representation methodology for both computer > efficiency and theorized language-independence. Additionally, this proposed > knowledge representation format allows for distributed systems level > optimization via the composition of packets during distributed query > processing. Packets proposed in this model contain variables that are filled > in by results during packet propagation across a P2P network completing a > search tree pattern. Packets are routed by hashing the known quantities at > each step utilizing the P2P content-addressible properties available by > giving each machine a unique ID. Each successful leaf node in a search tree > returns completed packets to a proxy machine that then returns results to the > user's machine requesting a structured knowledge search. The proxy machine is > used to allow multiple identical queries to be coordinated by a single > machine (based upon its P2P machine ID and current time) so that a > distributed caching feature can be realized. The use of system time in this > delegation is so that the responsibility for popular queries can rotate > across all machines in the network. That is, the proposed model offers one > method to integrate P2P technology, distributed database technology and > integer-based knowledge representation. Rule systems and ontology can be > expressed alongside structured knowledge in the same format alongside it in > the structured knowledgebase. > > > > 01 Knowledge Representation Format > > > > This representation format utilizes 128-bit unsigned integers which are > theoretically mappable to URI. This representation allows a style of > reification on the triples called 'direct reification' which allows an > element in a triple to uniquely reference another triple, though the > considerations to descriptive logics and advanced ontological research are > unknown. Additionally, each triple contains a key that allows it to be > referenced uniquely making this proposed system quadruples based. The use of > the Unique ID can be used for metadata purposes and source provenance in > paraconsistent multisource knowledgebases. Pair-wise hashings can be obtained > from the numerical elements of these quadruples to both provide redundancy > and fault-tolerance, essential for P2P systems, and to allow > content-addressible routing on the level of structured knowledge assertions. > That is, given four 128-bit integers: > > > > [Entity One][Relationship][Entity Two][Unique ID] > > > > we can obtain a number of hashes including HASH(Entity One, Relationship), > HASH(Relationship, Entity Two), HASH(Unique ID, Unique ID) with these hashes > mapping to machines that each hold a duplicate of the structured knowledge > assertion. This is of additional use to content-addressible routing during > distributed query processing. Additionally possible in this numerical > approach is to use the properties of certain bits to indicate logical > negation (Entities resulting in a disjoint set, Relationships resulting in > the logical negation, UniqueID's utilizable by a rule system. So, we can > theorize that the left-most bit is utilized for this purpose. The rightmost > two bits can be of additional use in distinguising which column an entity is > introduced in (Entity One, Relationship, Entity Two, Unique ID). This can > allow indirect reference across the distributed knowledgebase and possibly be > of use to reasoning systems. > > > > 02 Distributed Cache > > > > The use of a system time along with the unique hash of a query, also > representable in a four-column notation, allows proxy machines to be > determined that can rotate to delegate responsibility over a large number of > machines. That is, something resembling: > > > > QueryHash = HASH(MakeUnique(Query)) > > Offset = DetermineOffset(QueryHash) > > Is CurrentTime() + Offset > NextTimeSliceBegin() ? > > If so, the proxy machine has ID = HASH(QueryHash, NextTimeSliceBegin()) > > Otherwise, the proxy machine ID is HASH(QueryHash, LastTimeSliceBegin()) > > > > Here DetermineOffset() can use some sort of modular process on the > QueryHash to spread the rotation of query proxies over a phase space. The > function MakeUnique() can sort the lines of a query and generate the hash in > some computationally efficient manner. > > > > 03 Numerical Subsystem > > > > With an ontological commitment of a HASVALUE relationship type, we can > implicitly know that the right-hand element, Entity Two, is numerical data > and utilize standard methods on representing and distinguishing integer types > and floating-point datatypes. That is, by distinguishing one relation type > for system use, numerical data can be stored alongside structured knowledge > data in the same integer-based format. Additionally possible is optimizing > arithmetical and algebraic relationship types to more effectively utilize > underlying hardware. > > > > 04 User Interface > > > > An imagined user interface for direct structured knowledgebase access > might resemble a graphical tabular interface, with four columns, where a user > can enter text or drag and drop icons into position to articulate a desired > query on the knowledgebase. Variables can be entered in the interface an > matching tuples and structure returned from a structured knowledge query can > be navigatible in the interface. Returned elements, sets and structure can, > imaginably, be graphically utilized with accompanying workspaces and folders, > where elements can be taken from to formulate a structured knowledge query. > An interface would aim to minimize the text typing required to utilize the > system by allowing entities, relationships and unique ID's to be visually > representable. > > > > 05 Self-Optimizing Ontology > > > > The numerical based approach allows, theoretically, machine-generated > ontology so that, along with a rule system (also capable of > machine-generation) entities can be stored efficiently. That is, theoretical > relationships between ontology and file compression are possible with a > knowledgebase handling its own storage in a manner that is independent from > the processing of queries across the distributed system. This correlates to > relationships between the AI subfields of decision trees and categorization > and more efficient, possibly unintuitive, ontological compression can be > written into the knowledgebase by algorithms. That is, algorithms can write > ontology and corresponding rules into a knowledgebase for purposes of > compression. > > > > 06 Conclusion > > > > Described is one possible model for a complete all-to-all asynchronous > parallel solution for writing, efficiently storing and retrieving a large > volume of knowledge with a high rate of input and output. The knowledge > representation format allows rules and ontology to be articulated alongside > structured knowledge; the use of integers may introduce exciting possiblities > into the fields of AI, NLP and machine translation. > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Golda Velez http://goldavelez.info > Webglimpse.Net http://webglimpse.net > Internet WorkShop http://iwhome.com > cell: (520) 440-1420 > "Help organize the world - index your own corner of the web!" > > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Golda Velez http://goldavelez.info Webglimpse.Net http://webglimpse.net Internet WorkShop http://iwhome.com cell: (520) 440-1420 "Help organize the world - index your own corner of the web!"
Received on Saturday, 10 February 2007 08:49:21 UTC