Re: numeric data on the web, numeric web search

See http://googleblog.blogspot.com/2009/04/adding-search-power-to-public-data.html.

This is the first intent at making large amounts of data available in structured formats.

Although it is not linked data in all conceivable formats from all sources on the web, the fact that the E-Government Act is forcing US federal agencies public data to make their data more accessible could be the push required to get linked data initiatives to the next level.

Time for a Semantic Web/Linked Data lobby in DC to make funding available to expand to all public domains.

Milton Ponson
GSM: +297 747 8280
Rainbow Warriors Core Foundation
PO Box 1154, Oranjestad
Aruba, Dutch Caribbean
www.rainbowwarriors.net
Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide
www.projectparadigm.info
NGO-Opensource: Creating ICT tools for NGOs worldwide for Project Paradigm
www.ngo-opensource.org
MetaPortal: providing online access to web sites and repositories of data and information for sustainable development
www.metaportal.info
SemanticWebSoftware, part of NGO-Opensource to enable SW technologies in the Metaportal project
www.semanticwebsoftware.info


--- On Wed, 4/29/09, Wolfgang Orthuber <orthuber@kfo-zmk.uni-kiel.de> wrote:

From: Wolfgang Orthuber <orthuber@kfo-zmk.uni-kiel.de>
Subject: numeric data on the web, numeric web search
To: public-lod@w3.org
Date: Wednesday, April 29, 2009, 3:25 PM



 
 

Hello!
 
We know that quantifiable objects play a central 
role in daily life. Nevertheless up to now quantifiable objects have in general 
no well defined globally machine readable and precise representation on the web. 
The following concept proposes a simple data structure called "pattern" for such 
representation of quantifiable objects in general which also allows their 
similarity search:
--------
 
* Numeric web search *
 
Web search is up to now word based. Additionally 
language independent similarity search of quantifiable objects is desirable. For 
well defined numeric representation of quantifiable objects a simple data 
structure called "pattern" is proposed, which contains a feature vector (a 
sequence of numbers) for representation of the object, and a "pattern name" 
which is a URI which uniquely identifies the kind of object which is represented 
by the feature vector.
 
Pattern:                  
Pattern name       
+        feature 
vector          (+ auxilliary 
data)
 
Patterns with the same pattern name represent the 
same kind of object. Because the number of possible pattern names is not 
limited*, infinitely* many different kinds of quantifiable objects can be 
represented by patterns.  (*only physically limited by finite time and 
energy)
 
So the search terms are not words, but feature 
vectors in patterns which allow quantification of similarity. Feature vectors of 
patterns with the same pattern name are directly comparable using a given 
metric. At this similarities of the original quantifiable objects are mapped to 
spatial similarities of the feature vectors. So similarity search is possible by 
calculating distances: Objects are the more similar, the smaller the distance 
between the feature vectors of the representing patterns is.
 
Due to the multitude of different kinds of 
quantifiable objects the work for development of efficient pattern resp. feature 
vector definitions for their representation is open ended. Global task sharing 
has the greatest potential: According to this suggestion every owner of an 
internet domain name abc.xyz gets the right to define feature vectors of all 
patterns with names abc..xyz/* (in well defined location 
abc.xyz/pat/*).
 
Patterns are machine readable, uniformly comparable 
and searchable. They allow to search with the same search engine not only for 
text, but also for an increasing number of well-defined quantifiable objects on 
the web. This bundling of the search activity into one crawler and web database 
for all quantifiable objects is much more efficient than building and managing a 
database and a crawler for every kind of object.
 
Numeric similarity search could be efficiently 
combined with conventional word based search. Details are described in http://www.orthuber.com/wpa.htm , 
don't hesitate to ask me further questions.
 
--------------------
It seems clear that 
introduction of the above conventions would have relevant advantages. Can this 
get support that we can step by step realize this?
 
Regards
 
Wolfgang Orthuber   (Mathematician and 
Orthodontist at University of Kiel / Germany)

Received on Wednesday, 29 April 2009 19:06:20 UTC