EARL, RDF, Interesting Examples and PubSub.com from Bob Wyman on 2004-02-15 (www-rdf-interest@w3.org from February 2004)

From: Bob Wyman <bob@wyman.us>
Date: Sat, 14 Feb 2004 20:53:28 -0500
To: "'Charles McCathieNevile'" <charles@w3.org>, "'Mansur Darlington'" <ensmjd@bath.ac.uk>
Cc: <info@oilit.com>, <www-rdf-interest@w3.org>, <semanticweb@yahoogroups.com>
Message-ID: <002401c3f366$820f9960$650aa8c0@BOBDEV>
Mansur Darlington wrote:
> the mindnumbing dumbness of the exapmles used.
Charles McCathieNevile wrote:
> I have been working on explaining a particular RDF 
> vocabulary - EARL.

	Imagine that you have:
	1. A large number of people generating EARL.
	2. A large number of people interested in EARL data but none
of them were interested in all EARL data.

	How do you distribute the EARL statements without requiring
any of:
	1. All EARL producers need to push their EARL to something
like a mailing list.
	2. All EARL consumers need to poll sites looking for new EARL
on regular schedules.
	3. Someone builds a central "EARL" registration site.

	Try this:
	1. Have EARL producers insert their EARL into RSS files.
	2. Ping PubSub.com when an EARL bearing RSS file is updated.
[1]
	3. Have subscribers use the "advanced" search on PubSub.com to
subscribe to the URI's that identify the EARL that they are interested
in.
	4. PubSub.com will then build a custom RSS file for each
subscriber containing just the EARL that they are interested in. The
result will be that distribution is easily achieved and people will
see the EARL that they want in near real-time after it is generated.
	5. Subscribers would build RDF processors that extract the URL
from their PubSub.com RSS files and do the appropriate analysis,
inference, rule processing, etc. that RDF enables.

	The idea is basically to allow people to "subscribe" to the
resources about which assertions are made in the EARL RDF files. The
subscriptions are processed through PubSub.com as an intermediary. The
result is a distributed network of loosely connected RDF producers and
consumers which enables the selective accretion of knowledge about the
subject resources in near real-time.
	To try this out, go to http://weblogs.pubsub.com/advanced and
subscribe to "Referenced URI's" specifying
"http://www.w3.org/WAI/ER/EARL/nmg-strawman#" in order to receive all
EARL statements embedded in RSS files. If the specific resource you're
interested in is identified by the URI "http://www.w3.org/" (as in
your examples), then simply "AND" that into the subscription as well
(remember to make sure you specify "Referenced URI's"). Now, sit back
and wait. In time, if anyone publishes EARL about the
"http://www.w3.org/" subject, you'll see it appear in the custom RSS
file we build for you. Assuming that the publisher pings us, you
should see it appear in your RSS file only a few minutes after it is
published. If you want to have the data pushed to you (rather than you
simply polling the PubSub.com site on a regular basis, then use the
PubSub.com REST interface defined at http://pubsub.com/REST/ and we'll
push (POST) the stuff to a web server that you specify.
	The same technique can be used to accrete knowledge about any
resource or subject from RDF resources. For instance, on the
rdfweb-dev list, Sean McCullough says he is building an RDF
application to track information on members of the Texas Legislature.
He's attacking the problem in the traditional centralized manner of
building all sorts of web scrapers that collect information about the
legislators and then building a big RDF file as a result. An
alternative method would be to simply publish an RDF schema for
statements about legislators and a list of URI's identifying each
legislator. He would then subscribe at PubSub.com to any RSS item that
references the URI's. As people discovered information about the
legislators, they would publish that data as RDF in their RSS files
and McCullough's application could extract it after PubSub.com
inserted it into the RSS file for his subscription (or messages sent
via the REST interface). This would greatly expand the data gathering
ability of the system and allow it to include things like "supports",
"opposes", etc. enabling people to take positions for or against those
of the legislators in a distributed but highly visible and
RDF-processable fashion. Also, people in other states or countries
could use the same approach by adopting McCullough's RDF schema and
simply publishing their own list of legislator's URI's. In time, you'd
have a tremendous engine generating massive amounts of detailed
information about legislators around the world...
	Of course, while PubSub.com allows you to subscribe to
real-time updates of new information, you could probably use the same
approach to retrieve historical, older information by using Google or
any of the other more traditional, past-focused search engines if they
allow searching for URIs. (use "site" on Google). 
	This form of loosely coupled data gathering and synthesis just
isn't practical without RDF and services like PubSub. However, I think
we'll find that it becomes a common pattern in the future.
	Mansur, does this qualify as something better than a
"mindnumbing" dumb example?

		bob wyman

[1] To ping PubSub.com, use XMLPRC to send a normal ping message to:
	http://xping.pubsub.com/ping/ 
using either the ping method defined by weblogs.com
(weblogUpdates.ping) or the extendedPing method defined by blo.gs
(weblogUpdates.extendedPing). We *definitely* prefer the extendedPing
method since it includes a parameter which passes the location of your
RSS file. This means we don't have to scrape your blog to try to
figure out what the RSS file is. See: blo.gs for information on the
extendedPing method and examples of it. See:
http://blo.gs/ping.php#details and
http://blo.gs/ping-example.php
Received on Saturday, 14 February 2004 21:42:05 UTC