Fwd: Syndication model convergence and RDF

I sent the mail below to a list set up by folks working on common libs
for syndication in Java a few days ago. Following discussion in the
"missing bit of RDF for XML people" thread, I reckon this is now close
enough to topic to avoid crosspost annoyance ;-)

---------- Forwarded message ----------
From: Danny Ayers <danny.ayers@gmail.com>
Date: Mon, 31 Jan 2005 11:01:26 +0100
Subject: Syndication model convergence and RDF
To: java-syndication@yahoogroups.com
Cc: atom-owl@googlegroups.com
...

It's really good to see a place for dialog between the groups and
individuals working on syndication/Java. A common object
representation of feeds seems a worthy goal, as do interoperable
interfaces.

I've been wondering about the same kind of issues through RDF-tinted
spectacles, and it seems to me there is also a lot of potential for
unification of the diverse formats through mapping to, and modelling
with Semantic Web technologies. Such a mapping/modelling could be
entirely compatible with and complementary to the Java work.

The way I imagine it could work would be to define a top-level (but
fairly loose) RDF Schema/OWL model of syndication data giving the
relationships between the key constructs: feedlists
(channels/blogrolls), entries, content and associated metadata. RDF
"profiles" could then be derived for the various formats, relating
constructs specific to individual languages (RSS 1.0, 1.1, 0.91, 2.0,
Atom) to the top-level view and by inference to each other. The
general approach would, I'd suggest, be one of looking at what's
already out there and pulling it together in a cohesive way, rather
than building anything from scratch.

Before going any further I should say that I wouldn't expect a total
'one rule to bind them' approach. Just a common single base-level
model of structures, and could potentially enable transparent interop
between syndication and RDF systems. There are aspects which I would
expect to remain out of scope in development of a common model - in
particular modelling the /engines/ of syndication: HTTP
serving/caching and the polling subsystem. Work that has been done on
versioning of entries (particularly by Henry, see below) suggest that
may be something that is best left until after a core model is in
place. Provenance more sophisticated than dc:source also brings with
it complications that may place it beyond 80/20 requirements for
RDF-oriented interop.

There have been recent developments which I reckon bring the idea of a
common RDF view of syndication within relatively easy reach, notably
the work around the Atom community to develop a normative mapping for
Atom feed data to an RDF/OWL model. So what have we got so far? Pretty
near everything required, only it's scattered about the Web. (I
believe Kevin has done work in this area with NewsMonster, but I must
confess I never looked closely at what he'd done there). Elsewhere:

One direct practical technique is to use XSLT for syntax-based mapping
from syndication format to RDF/XML. Stylesheets have been done to
normalise "any" feed format to RSS 1.0 (e.g. Morten Frederikson's at
[1]). A new Atom-specific translation has just been created [2]. A
while ago I suggested [3] an approach to using XSLT to disambiguate
RSS 2.0 information (in particular extensions). With slightly
different implementation details, the GRDDL (Gleaning Resource
Descriptions from Dialects of Languages) [4] technique offers a more
generally useful approach, and has W3C-blessing. But the results of
such transformations are only minimally defined (resources,
properties, literals) without a schema/ontology.

A similar kind of mapping is implied by the Redland/Raptor toolkit,
which can parse Atom/RSS/tag soup into RDF model(s), I think similar
input stages may also be available for Jena (I've done it with Jena
myself, only pre-processing with XSLT).

There has been a basic schema for RSS 1.0 all along, but this has been
tidied up considerably for RSS 1.1 [5], and I believe the intention
there is to provide an OWL DL model. I started some work on Atom/OWL
[6] then got distracted by day-job, but Henry Story picked it up from
there and worked through a lot of possible Atom/OWL models. His
motivation at first at least was a model to use with the Java BlogEd
[7] authoring/posting application. There's a version of Henry's OWL
ontology for Atom on the Wiki [8], his implementation work (in
progress) can be found on a blog at [9].

I should mention the current feedlist/channels/blogroll representation
- OPML seems to be the de facto standard though is another essentially
incompatible format. The Technorati folk favour XOXO [10] (they also
have Attention.xml which also represents data in this domain). OCS
offers one RDF-based model, FOAF blogrolls  [11] another (there are
XSLTs between some of these, only I've got tired of finding the links
;-)

At the same time it would be useful to try and take a common approach
to other bits of modelling that haven't quite yet congealed - the
del.icio.us/Flickr/Technorati tags for example.

How might this tie in with the Java work, and what would be the
benefits? In the first place it would help with translations between
formats, providing a sound formal base. Syndication feeds of all kinds
could be arbitrarily rich and still be comprehensible by off-the-shelf
APIs. Module/vocab authors would be able to get cross-format
compatibility from day one. RDF systems (with a little translation)
could compatibly consume and produce feed data.

What's needed? Somewhere to talk about this stuff - Henry's set up an
Atom/OWL list at [12] and material has gone onto the Atom and ESW
Wikis. What's also needed is some for of coordination point - this
stuff is pretty spread well about the place at present - I'd be happy
to host a Wiki or whatever if needed. Deliverables I'd say include the
top-level model, individual format mappings (especially XSLT), tests
(especially a validator).

Cheers,
Danny.

[0] http://dannyayers.com/archives/2005/01/31/syndication-model-convergence-and-rdf/
[1] http://purl.org/net/syndication/subscribe/feed-rss1.0.xsl
[2] http://www.imc.org/atom-syntax/mail-archive/msg12615.html
[3] http://www.xml.com/pub/a/2003/07/23/extendingrss.html
[4] http://www.w3.org/2004/01/rdxh/spec
[5] http://inamidst.com/rss1.1/
[6] http://semtext.org/atom/
[7] https://bloged.dev.java.net/
[8] http://www.intertwingly.net/wiki/pie/AtomOWL
[9] http://bblfish.net/work/atom-owl/2004-08-12/blogexample.html
[10] http://developers.technorati.com/wiki/xhtmloutlines
[11] http://www-106.ibm.com/developerworks/xml/library/x-pblog/
[12] http://groups-beta.google.com/group/atom-owl


--

http://dannyayers.com


-- 

http://dannyayers.com

Received on Thursday, 3 February 2005 17:12:11 UTC