Re: EARL, RDF, Interesting Examples and PubSub.com from Charles McCathieNevile on 2004-02-15 (www-rdf-interest@w3.org from February 2004)

From: Charles McCathieNevile <charles@w3.org>
Date: Sun, 15 Feb 2004 05:47:17 -0500 (EST)
To: Bob Wyman <bob@wyman.us>
Cc: 'Mansur Darlington' <ensmjd@bath.ac.uk>, info@oilit.com, www-rdf-interest@w3.org, semanticweb@yahoogroups.com
Message-ID: <Pine.LNX.4.55.0402150540440.32044@homer.w3.org>
On Sat, 14 Feb 2004, Bob Wyman wrote:

>Mansur Darlington wrote:
>> the mindnumbing dumbness of the exapmles used.
>Charles McCathieNevile wrote:
>> I have been working on explaining a particular RDF
>> vocabulary - EARL.
>
>	Imagine that you have:
>	1. A large number of people generating EARL.
>	2. A large number of people interested in EARL data but none
>of them were interested in all EARL data.
>
>	How do you distribute the EARL statements without requiring
>any of:
>	1. All EARL producers need to push their EARL to something
>like a mailing list.
>	2. All EARL consumers need to poll sites looking for new EARL
>on regular schedules.
>	3. Someone builds a central "EARL" registration site.

Hi Bob,

I don't think there is a good way to collect rdf information that doesn't
rely on one of these. However, I hope that crawling RDF sites via seeAlso or
something is reasonably feasible so that instead of a very few organisations
ahving a very large collection of RDF many people can do something along the
lines of the PubSub service you describe - perhaps just for a few pieces of
interesting information.

Federating these aggregators is of course an interesting problem too. My
preferred appraoch has been to think of Annotea servers as repositories
(mostly because they have a query language, and work on the assumption of
creating first class web content for people who don't even have the ability
to publish to the web (which for me is the sine qua non of participation).
One of the interesting questions there is how to work across multiple annotea
servers existing.

It seems to me an important part of this is a standard way of querying for a
bit of RDF, so I can pass the same query to different systems...

cheers

Chaals


>	Try this:
>	1. Have EARL producers insert their EARL into RSS files.
>	2. Ping PubSub.com when an EARL bearing RSS file is updated.
>[1]
>	3. Have subscribers use the "advanced" search on PubSub.com to
>subscribe to the URI's that identify the EARL that they are interested
>in.
>	4. PubSub.com will then build a custom RSS file for each
>subscriber containing just the EARL that they are interested in. The
>result will be that distribution is easily achieved and people will
>see the EARL that they want in near real-time after it is generated.
>	5. Subscribers would build RDF processors that extract the URL
>from their PubSub.com RSS files and do the appropriate analysis,
>inference, rule processing, etc. that RDF enables.
>
>	The idea is basically to allow people to "subscribe" to the
>resources about which assertions are made in the EARL RDF files. The
>subscriptions are processed through PubSub.com as an intermediary. The
>result is a distributed network of loosely connected RDF producers and
>consumers which enables the selective accretion of knowledge about the
>subject resources in near real-time.
>	To try this out, go to http://weblogs.pubsub.com/advanced and
>subscribe to "Referenced URI's" specifying
>"http://www.w3.org/WAI/ER/EARL/nmg-strawman#" in order to receive all
>EARL statements embedded in RSS files. If the specific resource you're
>interested in is identified by the URI "http://www.w3.org/" (as in
>your examples), then simply "AND" that into the subscription as well
>(remember to make sure you specify "Referenced URI's"). Now, sit back
>and wait. In time, if anyone publishes EARL about the
>"http://www.w3.org/" subject, you'll see it appear in the custom RSS
>file we build for you. Assuming that the publisher pings us, you
>should see it appear in your RSS file only a few minutes after it is
>published. If you want to have the data pushed to you (rather than you
>simply polling the PubSub.com site on a regular basis, then use the
>PubSub.com REST interface defined at http://pubsub.com/REST/ and we'll
>push (POST) the stuff to a web server that you specify.
>	The same technique can be used to accrete knowledge about any
>resource or subject from RDF resources. For instance, on the
>rdfweb-dev list, Sean McCullough says he is building an RDF
>application to track information on members of the Texas Legislature.
>He's attacking the problem in the traditional centralized manner of
>building all sorts of web scrapers that collect information about the
>legislators and then building a big RDF file as a result. An
>alternative method would be to simply publish an RDF schema for
>statements about legislators and a list of URI's identifying each
>legislator. He would then subscribe at PubSub.com to any RSS item that
>references the URI's. As people discovered information about the
>legislators, they would publish that data as RDF in their RSS files
>and McCullough's application could extract it after PubSub.com
>inserted it into the RSS file for his subscription (or messages sent
>via the REST interface). This would greatly expand the data gathering
>ability of the system and allow it to include things like "supports",
>"opposes", etc. enabling people to take positions for or against those
>of the legislators in a distributed but highly visible and
>RDF-processable fashion. Also, people in other states or countries
>could use the same approach by adopting McCullough's RDF schema and
>simply publishing their own list of legislator's URI's. In time, you'd
>have a tremendous engine generating massive amounts of detailed
>information about legislators around the world...
>	Of course, while PubSub.com allows you to subscribe to
>real-time updates of new information, you could probably use the same
>approach to retrieve historical, older information by using Google or
>any of the other more traditional, past-focused search engines if they
>allow searching for URIs. (use "site" on Google).
>	This form of loosely coupled data gathering and synthesis just
>isn't practical without RDF and services like PubSub. However, I think
>we'll find that it becomes a common pattern in the future.
>	Mansur, does this qualify as something better than a
>"mindnumbing" dumb example?
>
>		bob wyman
>
>[1] To ping PubSub.com, use XMLPRC to send a normal ping message to:
>	http://xping.pubsub.com/ping/
>using either the ping method defined by weblogs.com
>(weblogUpdates.ping) or the extendedPing method defined by blo.gs
>(weblogUpdates.extendedPing). We *definitely* prefer the extendedPing
>method since it includes a parameter which passes the location of your
>RSS file. This means we don't have to scrape your blog to try to
>figure out what the RSS file is. See: blo.gs for information on the
>extendedPing method and examples of it. See:
>http://blo.gs/ping.php#details and
>http://blo.gs/ping-example.php
>

Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409 134 136
SWAD-E http://www.w3.org/2001/sw/Europe         fax(france): +33 4 92 38 78 22
 Post:   21 Mitchell street, FOOTSCRAY Vic 3011, Australia    or
 W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France
Received on Sunday, 15 February 2004 05:47:19 UTC