- From: Bob Wyman <bob@wyman.us>
- Date: Sun, 15 Feb 2004 14:44:56 -0500
- To: Charles McCathieNevile <charles@w3.org>
- Cc: Bob Wyman <bob@wyman.us>, "'Mansur Darlington'" <ensmjd@bath.ac.uk>, info@oilit.com, www-rdf-interest@w3.org, semanticweb@yahoogroups.com
Charles McCathieNevile wrote: > I hope that crawling RDF sites via seeAlso or > something is reasonably feasible seeAlso provides only unidirectional links and as such requires a great deal of configuration management to be useful. The reliance on URLs (URIs for resources) by seeAlso means that you can only find things which are known to the generator of the RDF which contains the seeAlso link. The result is a closed network of data. On the other hand, the method that I propose, relying on PubSub.com's content-based matching, does not have this constraint. Using the PubSub.com method, related chunks of RDF are collected together based on their content rather than the links established between them. The result is a more open and dynamic knowledge network. The content-based method I propose has the effect of making URI links bi-directional. By watching for and reporting referenced URI's, we provide the back links that are not natively supported in the system. Whenever someone references a URI amoung the more than 1 million blogs that we monitor, you will be provided with a link to the referencing site. Thus, a bi-directional network is constructed on the fly. The result is a much more open and dynamic knowledge network which does not require that any two RDF sites have explicit knowledge of each others URIs. i.e. "seeAlso" is not required in such a network. It is only useful in building constrained, closed networks. > Federating these aggregators is of course an > interesting problem too. The problem of federation is an interesting one as well as one that has been studied extensively. Some of the best work on publish/subscribe federation can be seen in the work of Antonio Carzaniga on the Siena project (see: http://www.cs.colorado.edu/~carzanig/siena/index.html). However, while federation is a fascinating problem, the requirement for federation really only arises when the volume of data or subscriptions rises to the point where a centralized system can't keep up with the it. Given that there isn't much RDF being produced today, we don't really have a need for federation at this time. It would make a great deal of sense to first experiment to discover the limits of non-federated systems and understand the dynamics of loosely coupled RDF networks before putting too much effort into building federated systems. Only in this manner can we really determine what the requirements are for a federated system. Currently, at PubSub.com, we are able to monitor over 1 million blogs yet the CPU of our matching engine is essentially idling most of the time (we never go over 4% or 5% of CPU...) We've got plenty of capacity to allow a great deal of experimentation and learning to occur. Let's understand the problem and application domain before making it more complex. > One of the interesting questions there is how to work > across multiple annotea servers existing. Unfortunately, Annotea use currently requires that for me to create an annotation, I must have an account with a known Annotea server and submit my annotations to it. The result, of course, is that while there may be many Annotea servers that are interested in annotations that I create, I'm only going to submit to one or a subset of them -- probably the most "popular" servers since with RDF, the more data you have, the better. Thus, power laws will come to dominate and there will be a need to develop potentially complex protocols to share annotations between Annotea servers in order to spread the knowledge captured in the annotations. On the other hand, if a content-based publish/subscribe service like PubSub.com is used to distribute and discover annotations, anyone can create an annotation and publish it simply by putting the annotation in some file that PubSub.com monitors. Then, any number of Annotea servers that are interested in aggregating these annotations can simply subscribe to the them independently. The result is a significant reduction in the need for cross-server coordination and a reduction in the influence of power laws and network effects. (Of course, if servers wish to constrain whose annotations they handle, they can require that authentication tokens be included in the annotations.) > It seems to me an important part of this is a standard > way of querying for a bit of RDF We provide, at PubSub.com, a method of querying for data based on its content. Of particular interest for RDF is the ability to query based on the URIs that are referenced in the content. This simple mechanism is sufficient to handle a very wide range of requirements. I invite you to experiment with it so that we can determine its limits and determine, based on practical experience, how to provide richer facilities. bob wyman
Received on Sunday, 15 February 2004 14:45:01 UTC