W3C home > Mailing lists > Public > public-lod@w3.org > April 2009

numeric data on the web, numeric web search

From: Wolfgang Orthuber <orthuber@kfo-zmk.uni-kiel.de>
Date: Wed, 29 Apr 2009 16:25:42 +0100
Message-ID: <006101c9c8de$c268c1f0$a3b35ec2@workstation>
To: <public-lod@w3.org>

We know that quantifiable objects play a central role in daily life. Nevertheless up to now quantifiable objects have in general no well defined globally machine readable and precise representation on the web. The following concept proposes a simple data structure called "pattern" for such representation of quantifiable objects in general which also allows their similarity search:

* Numeric web search *

Web search is up to now word based. Additionally language independent similarity search of quantifiable objects is desirable. For well defined numeric representation of quantifiable objects a simple data structure called "pattern" is proposed, which contains a feature vector (a sequence of numbers) for representation of the object, and a "pattern name" which is a URI which uniquely identifies the kind of object which is represented by the feature vector.

Pattern:                  Pattern name       +        feature vector          (+ auxilliary data)

Patterns with the same pattern name represent the same kind of object. Because the number of possible pattern names is not limited*, infinitely* many different kinds of quantifiable objects can be represented by patterns.  (*only physically limited by finite time and energy)

So the search terms are not words, but feature vectors in patterns which allow quantification of similarity. Feature vectors of patterns with the same pattern name are directly comparable using a given metric. At this similarities of the original quantifiable objects are mapped to spatial similarities of the feature vectors. So similarity search is possible by calculating distances: Objects are the more similar, the smaller the distance between the feature vectors of the representing patterns is.

Due to the multitude of different kinds of quantifiable objects the work for development of efficient pattern resp. feature vector definitions for their representation is open ended. Global task sharing has the greatest potential: According to this suggestion every owner of an internet domain name abc.xyz gets the right to define feature vectors of all patterns with names abc.xyz/* (in well defined location abc.xyz/pat/*).

Patterns are machine readable, uniformly comparable and searchable. They allow to search with the same search engine not only for text, but also for an increasing number of well-defined quantifiable objects on the web. This bundling of the search activity into one crawler and web database for all quantifiable objects is much more efficient than building and managing a database and a crawler for every kind of object.

Numeric similarity search could be efficiently combined with conventional word based search. Details are described in http://www.orthuber.com/wpa.htm , don't hesitate to ask me further questions.

It seems clear that introduction of the above conventions would have relevant advantages. Can this get support that we can step by step realize this?


Wolfgang Orthuber   (Mathematician and Orthodontist at University of Kiel / Germany)
Received on Wednesday, 29 April 2009 14:23:23 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:15:56 UTC