- From: RA Poell <poell@fel.tno.nl>
- Date: Thu, 02 Nov 2000 07:32:35 +0100
- To: Seth Russell <seth@robustai.net>, rdf-logic <www-rdf-logic@w3.org>
<Seth Russel> >If my assumptions are correct (sheeze i hope they are) this still >means that one will probably encounter many different URIs for the >same concept. Pat, does this problem bear on your concerns? The >only solution I see to that problem is for each local application of >the Semantic Web to install some kind of fuzzy node matcher that would >attempt to combine nodes that are really the same, based upon their >relationships to literals and other known nodes. In combining nodes >the applications could preserve all the original URIs and the sources >from which they were originally read. Then when the application wants >to speak RDF to those sources, they could use the URIs which that >source will recognize. See my signature for an example. </Seth Russel> This is exactly what I do with Notion System. I reach a critical mass know in NS (> 200 000 notions) and some notions become really enormous. These big ones need particular filtering and clustering techniques when you want to represent them, but the use of them (and the other notions they are related to) during automatic analyses on documents is no problem at all. Notion System does have a small fuzzy matcher (though it can be improved) in order to find candidate doubles. In fact this is something that will happen (perhaps more often than we think) so this is a necessary feature. When constructing (automatically or by hand) meta data about a particular document (in RDF or some other form) the contents (basically the names used in the document) should be "identified" (i.e. URI-fied, make the step from the text string to an identifier). This action, if done by hand, is not very difficult if the references are available (which is not yet the case). On the other hand, if this is done automatically, the agent in charge will need to compare the candidate concepts (and their relationships to other concepts) with the other candidate concepts from the document. The things you need to know for this identification action might be different (depending on the case) for human actors and software agents. I did some experiences with Notion System and automatic analyzing of web pages and (in the domains covered by the actual knowledge base) the results are very hopeful. Of course certainty is never reached but the concepts this agent thinks are the ones the document is about are often the good ones. These agents are authorized (when the probability has a particular threshold) to create new relationships (between the document and the concepts) but also comes up with new concepts he discovered (in fact he has identified something and can't find any probable notion for it = negation probability) and new information about existing concepts (e.g. an email not yet know for a person identified by some other characteristics). In order to keep things a bit clean he is not allowed to create a new notion but this could be done. The semantic network, expanded with the logic necessary to navigate in it (and use the meaning of the links), allows humans and agents to make assumptions about how good a particular notion matches the name (text string) in a document. URI's within the document make identification much better (perhaps even perfect) and allow new information to be added. But I don't think that an URI alone allows this (unless it is THE? identifing URI). You will need a reference network (the semantic meaning of the contents of the documents with references to a particular URI). To be clear, not every problem related to this is solved yet in Notion System. A lot of work still remains to be done. The example of Seth's signature (see below) could be a part of the information about his particular notion (topic Seth Russell) and about other concepts (RSS, MyMemory) that are or are not known already. If Seth can be identified by his name and the fact that he is a member of this mailinglist and is interested in RDF but his email address was not yet known this "fact" will be added. His URI given gives another info etc. For the other topics Seth gives a part of a conceptual network (URI's of documents related to the topic) but the semantics are not clearly stated (probably something like "handles" or at least "is mentioned in" ). The other information about these topics can be directly mapped to relationships (sometimes with only data (description:…) sometimes to other concepts/notions/topics (RDF). <signature> topic: Seth Russell URI: http://robustai.net/~seth/index.htm email: seth@robustai.net waiting for: RSS is working on: MyMemory needs collaboration on: MyMemory topic: RSS anagramOf: (alternative: Rich Site Summary, RDF Site Summary) URI (from source: http://rss.oreillynet.com/): http://purl.org/rss/ URI (from source: http://InternetAlchemy.org/): http://InternetAlchemy.org/rss/ URI (from source: http://www.xml.com/): http://www.xml.com/pub/2000/07/17/syndication/rss.html topic: MyMemory description: "a local application of the Semantic Web" hasAbilityTo: (and: (read RdF) (write RDF)) </signature> Friendly greetings Ronald Poell TNO - Netherlands http://www.tno.nl http://www.notionsystem.com email: poell@fel.tno.nl, rapoell@notionsystem.com
Received on Thursday, 2 November 2000 01:33:22 UTC