Anniversary of the Semantic Web. Two centuries of triples - looking back at the early days from Reto Bachmann-Gmür on 2008-01-16 (semantic-web@w3.org from January 2008)

From: Reto Bachmann-Gmür <reto@gmuer.ch>
Date: Wed, 16 Jan 2008 16:28:27 +0100
To: semantic-web@w3.org
Message-ID: <478E229B.9090500@gmuer.ch>
While we can still read two hundred years old texts quite easily we see 
our machines struggling to deal with triples of the same age. Looking at 
the triples we see many names starting with "http", what are these names 
and why do they require such a lot of temporal and cultural disambiguation?

Before Semantic Web started and the triples began to  flow, in the 
technological developed regions a lot of information was looked up using 
the protocol HTTP in combination with a hierarchical naming system 
called DNS. The http-name where originally addresses that could be 
resolved within that hierarchy, the idea was that for every name one 
could contact a system which would return an authoritative definition of 
that name. Originally this system was relatively stable, individuals and 
organizations could rent a sub-section of the namespace. The root of the 
namespace was originally controlled by organizations of the United 
States of America. A European network ("Open Root Server Network") 
replicated the America controlled network but was designed to become 
independent should the political situation require it. As the Open Root 
Server Network was never detached for a prolonged period the system 
worked as unique hierarchical naming system for around thirty years. In 
2012 a coalition of governments and civil organizations campaigned for a 
"free" and "save" naming system. This campaign eventually led to the 
"Free Open Network (FON)" which offered names free of charge and 
guaranteed "safe"-names by a court-system revoking names found to be 
"misleading or dangerous to the public". The acceptance of the new 
system was regionally different, in several countries the usage of the 
new system become mandated by law. On the American continent and in 
parts of Europe the old system continued to be dominant. Disambiguations 
becomes especially hard since in 2015 the FON authorities redefined some 
terms of popular vocabularies, many parties using names assigned by FON 
kept the old definitions arguing that some terms have outgrown the web 
and would have a common sense meaning.

An additional issue is sheer amount of names. By the time we sometimes 
had several thousand names for the same thing. It can seem paradoxical 
that the most unimportant terms had the highest number of synonyms. The 
reason for this is, that in the early days people were arguing that 
everything should have a name. While the triple-spaces where defined to 
allow anonymous contextual entities many preferred to name just 
everything, so that an authoritative definition could be looked up. For 
terms not enough important for a social consensus on a well known set of 
names to arise, many processors just made up names themselves in their 
http-namespaces. The same was the case when the information was not 
sufficient for identification, for example every time you walked through 
an area monitored by video camera you would be assigned an http-name by 
the monitoring system. The inflation of names was so big, that for many 
names we cannot even find definitions in libraries. The ODC defines less 
than a milionth of the http-names used at the time. With this names and 
treating the other http-names as contextual (i.e. ignoring the name) we 
can reasonably interpret many old triples. However for many http-names 
we will ultimately never know if it was just a label associated to a 
contextual node or if it in fact had an intersubjective meaning at the time.
Received on Wednesday, 16 January 2008 15:28:33 UTC