Re: names, URIs and ontologies from RA Poell on 2000-11-02 (www-rdf-logic@w3.org from November 2000)

From: RA Poell <poell@fel.tno.nl>
Date: Thu, 02 Nov 2000 07:32:35 +0100
To: Seth Russell <seth@robustai.net>, rdf-logic <www-rdf-logic@w3.org>
Message-ID: <3A010A83.2F687465@fel.tno.nl>
<Seth Russel>
>If my assumptions are correct (sheeze i hope they are) this still
>means that one will probably encounter many different URIs for the
>same concept.   Pat, does this problem bear on your concerns?   The
>only solution I see to that problem is for each local application of
>the Semantic Web to install some kind of fuzzy node matcher that would
>attempt to combine nodes that are really the same, based upon their
>relationships to literals and other known nodes.  In combining nodes
>the applications could preserve all the original URIs and the sources
>from which they were originally read.  Then when the application wants
>to speak RDF to those sources, they could use the URIs which that
>source will recognize.  See my signature for an example.
</Seth Russel>

This is exactly what I do with Notion System. I reach a critical mass
know in NS (> 200 000 notions) and some notions become really enormous.
These big ones need particular filtering and clustering techniques when
you want to represent them, but the use of them (and the other notions
they are related to) during automatic analyses on documents is no
problem at all. 
Notion System does have a small fuzzy matcher (though it can be
improved) in order to find candidate doubles. In fact this is something
that will happen (perhaps more often than we think) so this is a
necessary feature.
When constructing (automatically or by hand) meta data about a
particular document (in RDF or some other form) the contents (basically
the names used in the document) should be "identified" (i.e. URI-fied,
make the step from the text string to an identifier). This action, if
done by hand, is not very difficult if the references are available
(which is not yet the case). On the other hand, if this is done
automatically, the agent in charge will need to compare the candidate
concepts (and their relationships to other concepts) with the other
candidate concepts from the document.
The things you need to know for this identification action might be
different (depending on the case) for human actors and software agents.
I did some experiences with Notion System and automatic analyzing of web
pages and (in the domains covered by the actual knowledge base) the
results are very hopeful. Of course certainty is never reached but the
concepts this agent thinks are the ones the document is about are often
the good ones. 
These agents are authorized (when the probability has a particular
threshold) to create new relationships (between the document and the
concepts) but also comes up with new concepts he discovered (in fact he
has identified something and can't find any probable notion for it =
negation probability) and new information about existing concepts (e.g.
an email not yet know for a person identified by some other
characteristics). In order to keep things a bit clean he is not allowed
to create a new notion but this could be done.
The semantic network, expanded with the logic necessary to navigate in
it (and use the meaning of the links), allows humans and agents to make
assumptions about how good a particular notion matches the name (text
string) in a document. URI's within the document make identification
much better (perhaps even perfect) and allow new information to be
added. But I don't think that an URI alone allows this (unless it is
THE? identifing URI). You will need a reference network (the semantic
meaning of the contents of the documents with references to a particular
URI).

To be clear, not every problem related to this is solved yet in Notion
System. A lot of work still remains to be done.

The example of  Seth's signature (see below) could be a part of the
information about his particular notion (topic Seth Russell) and about
other concepts (RSS, MyMemory) that are or are not known already. If
Seth can be identified by his name and the fact that he is a member of
this mailinglist and is interested in RDF but his email address was not
yet known this "fact" will be added. His URI given gives another info
etc.
For the other topics Seth gives a part of a conceptual network (URI's of
documents related to the topic) but the semantics are not clearly stated
(probably something like "handles" or at least "is mentioned in" ). The
other information about these topics can be directly mapped to
relationships (sometimes with only data (description:…) sometimes to
other concepts/notions/topics (RDF).

<signature>
topic: Seth Russell
URI:  http://robustai.net/~seth/index.htm
email: seth@robustai.net
waiting for:  RSS
is working on:  MyMemory
needs collaboration on: MyMemory

topic: RSS
anagramOf: (alternative: Rich Site Summary, RDF Site Summary)
URI (from source: http://rss.oreillynet.com/): http://purl.org/rss/
URI (from source: http://InternetAlchemy.org/):
http://InternetAlchemy.org/rss/
URI (from source: http://www.xml.com/):
http://www.xml.com/pub/2000/07/17/syndication/rss.html

topic: MyMemory
description: "a local application of the Semantic Web"
hasAbilityTo: (and:  (read RdF) (write RDF))
</signature>



Friendly greetings

Ronald Poell
TNO - Netherlands
http://www.tno.nl
http://www.notionsystem.com
email: poell@fel.tno.nl, rapoell@notionsystem.com
Received on Thursday, 2 November 2000 01:33:22 UTC