Re: Atom Triples Internet Draft from Niklas Lindström on 2008-07-04 (semantic-web@w3.org from July 2008)

From: Niklas Lindström <lindstream@gmail.com>
Date: Fri, 4 Jul 2008 12:08:16 +0200
To: "Dave Beckett" <dave@dajobe.org>, mnot@pobox.com
Cc: "Booth, David (HP Software - Boston)" <dbooth@hp.com>, "Story Henry" <henry.story@bblfish.net>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <cf8107640807040308m271b89d3ob50676d14994c0a8@mail.gmail.com>
Dave, Mark, all,

this is very interesting! Thank you for initializing this work.

I think there is *great* opportunity in formalizing how RDF and Atom
can complement each other. There are also a bunch of potential
problems in doing so, which I will try to elaborate on in this
message. First, I will introduce my work, then elaborate some general
ideas, and finally address my thoughts about the Atom Triples RFC.

I do think Atom extensions are quite useful, and RDF provides a lot of
usefulness for these (to avoid reinvention as I mention below). Still,
I'm inclined to think of Atom as instrumental only, and not generally
suited for direct RDF embedding (in contrast to the insulated carrying
in atom:content; again, see below).

(I'd also like to apologize for the length of this message. I just
wanted to cover the gamut of my perspective. I've numbered specific
points for easier decomposition of my claims. If the discussion goes
on further, I suppose it may be off-topic for this list and should
continue elsewhere (privately or perhaps in a dedicated
atomtriples-list)?)



== Why I'm Interested ==

Before I address the subject, I'd like to explain where I come from
with this. I work in a project for the Swedish government to create a
legal document/information system. We use RDF to describe laws, and
Atom to transport both the RDF information and available digital
representations.

We do this to collect resources from numerous government agencies, and
to expose the resulting repository over time (in turn used to e.g.
load and update a SPARQL endpoint).

We have gained many ideas and opinions regarding RDF and Atom from
this work, and I intend to present the core of that in another message
to this list, hopefully "soonish". This may be interesting for the LoD
community in general. (In particular, we leverage Atom Archives (RFC
5005), and representations of deleted entries (possibly using the
Tombstones RFC draft and/or FeedSync).)

We're also interested in creating a GData-like service (as a kind of
"semi-cooked" result data for simple presentation and filtering
tools). We may find it useful to embed *parts* of RDF in the entries
in that service, for which Atom Triples could be useful.



== Considering RDF and Atom ==

(Note that I always speak from "my humble opinion" and would love both
corrections and more discussion. This isn't the output of scientific
research, only what I've managed to form opinions about so far.)


1. Atom can be used as an RDF envelope. This has been put to practice
by many (as others in this thread have already mentioned), e.g.
Doapspace, and ourselves as stated above.

Putting serialized RDF as a "payload" in atom:content is an
"insulated" transport. In this scenario Atom is an envelope dealing
with the instrumental mechanisms of resource updates over HTTP. How to
use atom:id in this scenario is open for discussion. (We put the
associated resource URI there, and mechanically construct a graph URI
from that, used as a context in which RDF from both content,
alternates and even enclosures are put.)

Also, entry properties such as atom:title and atom:summary may or may
not be equal to a specific triple about the resource (values could
also come from some excerpt or aggregation of these, or elsewhere
entirely).


2. Atom lacks a formal description model. While there was initial
intent to "ground" Atom in RDF and merely make the serialization
specific, it has since been used without formal regard for e.g.
expected description model (it's "just a level up" from "only XML" in
this aspect).

The nature of the atom:id isn't probed at depth semantically. It is
specific though insofar as to say: it doesn't identify an individual
*entry instance* (element), it must be unique, and "it is suggested
that the atom:id element be stored along with the associated
resource.". The formal relation between an entry and "the associated
resource" isn't specified.  The same "reduced semantics" goes for e.g.
atom:title (though it seems to be equivalent to rdfs:label).

I'd say the metadata (and labelling) technique used is sparse in
semantics and simplistic (as in course-grained, even vague). I believe
this is intentional, to keep the focus on syndication of resources
over time (in principle quite close to how HTTP is designed).


3. RDF is formal, precise and intentionally horizontal/universal. You
can say just about anything, but (in the long run) you won't get away
with anything less than concrete preciseness and informed identity
management.

This is the perceived "RDF tax", which I can understand. Think of the
continuing discussions about e.g. "my homepage versus me" identity
issues, the HttpRange-14 resolution, equality and granularity of
different predicates and their usage, and so on. I have no problems
with this, as it is crucial to iron out these issues to get a
sustainable data model. But this "tax" may scare off some pragmatists,
who rather use Atom (with extensions) to add "somewhat" precise
properties and values in entries "associated with a resource".

Of course, in so doing, one is left to leave out or reinvent
vocabularies, datatyped values etc, or use such in an haphazard
manner. There is great potential for leveraging RDF here.


4. Atom constructs can be interpreted (as almost anything) as RDF. I
claim that this should be done with care, since:

4.1. The Atom format "casually" deals with provenance and temporality.
It is designed around the *use* of these concepts, rather than the
formalized *meaning* of them. Regarding why these concepts can be very
tricky to capture with triples (in a general way), I refer to the
ongoing work around contexts/named graphs in RDF (and the advanced,
not so common, reification technique). Basically (IMHO), describing
and using events in "first order" RDF is easy, describing events
*carrying statements* is not.

4.2. An Atom Entry can be viewed as many things:

4.2.1. A *manifest* to expose one or more representations of a
resource. This is a rather "REST-centric" way of exposing resources,
and is a common take on how to transport RDF as described in [1]
above.

4.2.2. Very interestingly (as e.g. David Booth also speculated in this
thread), it can also represent a *serialized context* (named graph),
complete with atom:updated and provenance (e.g. feed:id or
atom:author). Here the atom:id could be the context identifier.

(As mentioned above, I do this indirectly by mechanically creating a
context URI from the resource URI I've put in atom:id.)

4.2.3. A *direct* resource description. Such as a contact post or
calendar event in GData, or GeoRSS. With GData contacts, Google
themselves states that atom:title contains the name of the contact.
Hence you *could* interpret atom:title as foaf:name. As I understand
it, this is the target scenario for the Atom Triples RFC?

These views aren't necessarily incompatible (though [4.2.2] is rather
special), and as I said in the beginning, the "GData service" I
envision upon our "core" resource depot could benefit from the Atom
Triples RFC.


As you may have noticed, I haven't mentioned AtomOwl yet. Generally I
believe that the difficulties in choosing how to represent Atom as RDF
[4.1] are too hard to put into practise for the quite idiomatic needs
I've had so far. I do not dismiss AtomOwl at all though; it's a
respectable work from which I've gained a lot of insight, and may
prove useful to me in other scenarios. Especially considering what
Henry Story has said, AtomOwl is probably the formal superset of all
of this. But it may be inaccessible for anyone not deeply familiar
with RDF, who just wants to go "half-way" by using Atom tools for RDF
syndication. (In the extreme, compare to using an RDF description of
the XHTML infoset of a page with embedded RDFa..)

But I believe there's lots of potential *without* formalizing Atom as
RDF. One is by using the technologies orthogonally [1 and 4.2.1].
Another is by letting Atom leverage *parts* of RDF [4.2.3]. That is
useful in itself (at least from an Atom perspective), and of course
also aid in getting RDF from Atom.



== About the Atom Triples RFC ==

Your draft made me quite happy, since I've been thinking about writing
down an idea called "EUFORIA" - "Embedding Unobtrusive Fragments Of
RDF In Atom". :)

Please take the following as rough food for thought.

I think that "Atom Triples" can be somewhat equivalent to Atom as RDFa
is to XHTML. So while it is true that GRDDL can be used to get
whatever RDF you want from whatever Atom you have, this RFC can
provide a stable set of assumptions (as Taylor Cowan replied), which
you can then provide a GRDDL profile for, or interpret directly.

(I continue the numbering here from above.)


5. A feed-level element declaring the use of embedded RDF is good
(akin to fh:archive of RFC 5005). Your "at:entrymap" should do for
"direct descriptions" [4.2.3].


6. Regarding choosing the subject (the associated resource). I like
that atom:id is chosen by default, but can be changed. Henry Story's
critique about how to catch it is a good point though, and should be
addressed. Perhaps along the lines of:

    <at:subject element="atom:id"/>
    <at:subject link-rel="alternate" type="application/rdf+xml" hreflang="" />

I only consider this from the perspective of [4.2.3]. If you're to
interpret the entries as individual graph contexts over time [4.2.2],
perhaps *another* feed-level element is better, say "at:graphentries".


7. As e.g. Taylor said, you don't really need the at:md in order to
use RDF vocabularies for extension elements. If you want full RDF/XML,
I also suggest just using atom:content, and I would prefer if RDF/XML
wasn't part of how Atom Triples work..

(I won't argue about the general problems with RDF/XML from an XML
perspective here, the RDF community has discussed these for years, and
proposed lots of remedies, such as "canonical" RDF/XML, TriX etc.)

To avoid embedding full RDF/XML, you could define a simpler mechanism
for how to express statements in entries. Such as defining how to
interpret certain extensions directly (those who use RDF vocabulary
URI:s as namespaces). As mentioned in this thread, this is already put
to practise (commonly by embedding e.g. DC, FOAF or DOAP elements, not
to mention W3C GeoRSS). This couples well with your mechanism for
capturing the subject.

This practise makes the embedded RDF function as regular extensions.
That is, for anyone not aware of/caring for the RDF, but rather "just
some labelled data".

In theory, it could be enough to recognize namespaces that an RDF
aware consumer *knows* is a vocabulary (via rdfs:type owl:Ontology).
However, I think explicit is better and would eliminate mistakes. I'd
suggest something like:

    <at:entrymap rdf-namespace-prefixes="dc foaf doap"/>

, with a whitespace-separated list of namespace prefixes (similar to
@extension-element-prefixes in XSLT).

This would supplement how to map non-RDF extensions into statements.


8. It may be enough to use elements only when making literal
statements, and to use atom:link for resource references.

Since atom:link @rel values must be URI:s (or one of the predefined
set of names), it seems viable to treat this similarly to RDFa - that
is interpret @rel as the predicate URI (i.e. not a qname as in RDFa),
and @href as the object URI. Furthermore, one could stipulate that
elements within atom:link elements can be interpreted as statements as
well, with the value of @href as subject.


9. To put this together, consider this rough sketch:

    <feed xmlns="..."
          xmlns:at="..."
          xmlns:dc="..."
          xmlns:foaf="...">

        <at:entrymap rdf-namespace-prefixes="dc foaf">
            <at:subject element="id"/>
            <at:map property="foaf:name" element="title"/>
        </at:entrymap>

        <entry>
            <id>http://purl.org/NET/dust/foaf#me</id>
            <title>Niklas Lindström</title>
            <foaf:givenname>Niklas</foaf:givenname>
            <summary>About Niklas.</summary>
            <content src="http://neverspace.net/me.html" type="text/html"/>
            <link rel="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
                  href="http://xmlns.com/foaf/0.1/Person"/>
            <link rel="http://xmlns.com/foaf/0.1/homepage"
href="http://neverspace.net/">
                <dc:title xml:lang="en">Neverspace</dc:title>
            </link>
        </entry>

    </feed>

, yielding the following RDF (in Turle notation):

    <http://purl.org/NET/dust/foaf#me> a foaf:Person;
        foaf:name "Niklas Lindström";
        foaf:givenname "Niklas";
        foaf:homepage <http://neverspace.net/> .

    <http://neverspace.net/> dc:title "Neverspace"@en .


10. For now, I'd ignore the trouble of atom:category (and you haven't
addressed it (yet)). There is lots of debate about what it means: is
it a tagging (akin to "skos:category") with the object a concatenation
of @scheme and @term; can it determine rdfs:type; is it a variant of
"atom:link with custom @rel" where @term *could* be the object URI..
Personally, I think atom:category may mean any of these, and also
represent a resource of its own, a "compressed subject-object" kind of
thing. This is probably the most haphazardly used element in Atom
today. But it works, albeit very differently, for different (vertical)
needs.

(At some point this or another RFC could define this as well, but it
is a lot to handle initially.)

----

In conclusion, I think it is an interesting proposal. As I initially
said, I don't really consider embedding RDF directly in Atom a
generally useful way from an RDF perspective. But as a bridge between
the state of Atom extension formalisms today and the usefulness of
RDF, it should be explored. It also formalizes how to make assumptions
about how to interpret atom:id from an RDF perspective (in different
ways depending on needs), which I find very useful.

Best regards,
Niklas Lindström
Received on Friday, 4 July 2008 10:08:53 UTC