- From: Niklas Lindström <lindstream@gmail.com>
- Date: Fri, 4 Jul 2008 12:08:16 +0200
- To: "Dave Beckett" <dave@dajobe.org>, mnot@pobox.com
- Cc: "Booth, David (HP Software - Boston)" <dbooth@hp.com>, "Story Henry" <henry.story@bblfish.net>, "semantic-web@w3.org" <semantic-web@w3.org>
Dave, Mark, all, this is very interesting! Thank you for initializing this work. I think there is *great* opportunity in formalizing how RDF and Atom can complement each other. There are also a bunch of potential problems in doing so, which I will try to elaborate on in this message. First, I will introduce my work, then elaborate some general ideas, and finally address my thoughts about the Atom Triples RFC. I do think Atom extensions are quite useful, and RDF provides a lot of usefulness for these (to avoid reinvention as I mention below). Still, I'm inclined to think of Atom as instrumental only, and not generally suited for direct RDF embedding (in contrast to the insulated carrying in atom:content; again, see below). (I'd also like to apologize for the length of this message. I just wanted to cover the gamut of my perspective. I've numbered specific points for easier decomposition of my claims. If the discussion goes on further, I suppose it may be off-topic for this list and should continue elsewhere (privately or perhaps in a dedicated atomtriples-list)?) == Why I'm Interested == Before I address the subject, I'd like to explain where I come from with this. I work in a project for the Swedish government to create a legal document/information system. We use RDF to describe laws, and Atom to transport both the RDF information and available digital representations. We do this to collect resources from numerous government agencies, and to expose the resulting repository over time (in turn used to e.g. load and update a SPARQL endpoint). We have gained many ideas and opinions regarding RDF and Atom from this work, and I intend to present the core of that in another message to this list, hopefully "soonish". This may be interesting for the LoD community in general. (In particular, we leverage Atom Archives (RFC 5005), and representations of deleted entries (possibly using the Tombstones RFC draft and/or FeedSync).) We're also interested in creating a GData-like service (as a kind of "semi-cooked" result data for simple presentation and filtering tools). We may find it useful to embed *parts* of RDF in the entries in that service, for which Atom Triples could be useful. == Considering RDF and Atom == (Note that I always speak from "my humble opinion" and would love both corrections and more discussion. This isn't the output of scientific research, only what I've managed to form opinions about so far.) 1. Atom can be used as an RDF envelope. This has been put to practice by many (as others in this thread have already mentioned), e.g. Doapspace, and ourselves as stated above. Putting serialized RDF as a "payload" in atom:content is an "insulated" transport. In this scenario Atom is an envelope dealing with the instrumental mechanisms of resource updates over HTTP. How to use atom:id in this scenario is open for discussion. (We put the associated resource URI there, and mechanically construct a graph URI from that, used as a context in which RDF from both content, alternates and even enclosures are put.) Also, entry properties such as atom:title and atom:summary may or may not be equal to a specific triple about the resource (values could also come from some excerpt or aggregation of these, or elsewhere entirely). 2. Atom lacks a formal description model. While there was initial intent to "ground" Atom in RDF and merely make the serialization specific, it has since been used without formal regard for e.g. expected description model (it's "just a level up" from "only XML" in this aspect). The nature of the atom:id isn't probed at depth semantically. It is specific though insofar as to say: it doesn't identify an individual *entry instance* (element), it must be unique, and "it is suggested that the atom:id element be stored along with the associated resource.". The formal relation between an entry and "the associated resource" isn't specified. The same "reduced semantics" goes for e.g. atom:title (though it seems to be equivalent to rdfs:label). I'd say the metadata (and labelling) technique used is sparse in semantics and simplistic (as in course-grained, even vague). I believe this is intentional, to keep the focus on syndication of resources over time (in principle quite close to how HTTP is designed). 3. RDF is formal, precise and intentionally horizontal/universal. You can say just about anything, but (in the long run) you won't get away with anything less than concrete preciseness and informed identity management. This is the perceived "RDF tax", which I can understand. Think of the continuing discussions about e.g. "my homepage versus me" identity issues, the HttpRange-14 resolution, equality and granularity of different predicates and their usage, and so on. I have no problems with this, as it is crucial to iron out these issues to get a sustainable data model. But this "tax" may scare off some pragmatists, who rather use Atom (with extensions) to add "somewhat" precise properties and values in entries "associated with a resource". Of course, in so doing, one is left to leave out or reinvent vocabularies, datatyped values etc, or use such in an haphazard manner. There is great potential for leveraging RDF here. 4. Atom constructs can be interpreted (as almost anything) as RDF. I claim that this should be done with care, since: 4.1. The Atom format "casually" deals with provenance and temporality. It is designed around the *use* of these concepts, rather than the formalized *meaning* of them. Regarding why these concepts can be very tricky to capture with triples (in a general way), I refer to the ongoing work around contexts/named graphs in RDF (and the advanced, not so common, reification technique). Basically (IMHO), describing and using events in "first order" RDF is easy, describing events *carrying statements* is not. 4.2. An Atom Entry can be viewed as many things: 4.2.1. A *manifest* to expose one or more representations of a resource. This is a rather "REST-centric" way of exposing resources, and is a common take on how to transport RDF as described in [1] above. 4.2.2. Very interestingly (as e.g. David Booth also speculated in this thread), it can also represent a *serialized context* (named graph), complete with atom:updated and provenance (e.g. feed:id or atom:author). Here the atom:id could be the context identifier. (As mentioned above, I do this indirectly by mechanically creating a context URI from the resource URI I've put in atom:id.) 4.2.3. A *direct* resource description. Such as a contact post or calendar event in GData, or GeoRSS. With GData contacts, Google themselves states that atom:title contains the name of the contact. Hence you *could* interpret atom:title as foaf:name. As I understand it, this is the target scenario for the Atom Triples RFC? These views aren't necessarily incompatible (though [4.2.2] is rather special), and as I said in the beginning, the "GData service" I envision upon our "core" resource depot could benefit from the Atom Triples RFC. As you may have noticed, I haven't mentioned AtomOwl yet. Generally I believe that the difficulties in choosing how to represent Atom as RDF [4.1] are too hard to put into practise for the quite idiomatic needs I've had so far. I do not dismiss AtomOwl at all though; it's a respectable work from which I've gained a lot of insight, and may prove useful to me in other scenarios. Especially considering what Henry Story has said, AtomOwl is probably the formal superset of all of this. But it may be inaccessible for anyone not deeply familiar with RDF, who just wants to go "half-way" by using Atom tools for RDF syndication. (In the extreme, compare to using an RDF description of the XHTML infoset of a page with embedded RDFa..) But I believe there's lots of potential *without* formalizing Atom as RDF. One is by using the technologies orthogonally [1 and 4.2.1]. Another is by letting Atom leverage *parts* of RDF [4.2.3]. That is useful in itself (at least from an Atom perspective), and of course also aid in getting RDF from Atom. == About the Atom Triples RFC == Your draft made me quite happy, since I've been thinking about writing down an idea called "EUFORIA" - "Embedding Unobtrusive Fragments Of RDF In Atom". :) Please take the following as rough food for thought. I think that "Atom Triples" can be somewhat equivalent to Atom as RDFa is to XHTML. So while it is true that GRDDL can be used to get whatever RDF you want from whatever Atom you have, this RFC can provide a stable set of assumptions (as Taylor Cowan replied), which you can then provide a GRDDL profile for, or interpret directly. (I continue the numbering here from above.) 5. A feed-level element declaring the use of embedded RDF is good (akin to fh:archive of RFC 5005). Your "at:entrymap" should do for "direct descriptions" [4.2.3]. 6. Regarding choosing the subject (the associated resource). I like that atom:id is chosen by default, but can be changed. Henry Story's critique about how to catch it is a good point though, and should be addressed. Perhaps along the lines of: <at:subject element="atom:id"/> <at:subject link-rel="alternate" type="application/rdf+xml" hreflang="" /> I only consider this from the perspective of [4.2.3]. If you're to interpret the entries as individual graph contexts over time [4.2.2], perhaps *another* feed-level element is better, say "at:graphentries". 7. As e.g. Taylor said, you don't really need the at:md in order to use RDF vocabularies for extension elements. If you want full RDF/XML, I also suggest just using atom:content, and I would prefer if RDF/XML wasn't part of how Atom Triples work.. (I won't argue about the general problems with RDF/XML from an XML perspective here, the RDF community has discussed these for years, and proposed lots of remedies, such as "canonical" RDF/XML, TriX etc.) To avoid embedding full RDF/XML, you could define a simpler mechanism for how to express statements in entries. Such as defining how to interpret certain extensions directly (those who use RDF vocabulary URI:s as namespaces). As mentioned in this thread, this is already put to practise (commonly by embedding e.g. DC, FOAF or DOAP elements, not to mention W3C GeoRSS). This couples well with your mechanism for capturing the subject. This practise makes the embedded RDF function as regular extensions. That is, for anyone not aware of/caring for the RDF, but rather "just some labelled data". In theory, it could be enough to recognize namespaces that an RDF aware consumer *knows* is a vocabulary (via rdfs:type owl:Ontology). However, I think explicit is better and would eliminate mistakes. I'd suggest something like: <at:entrymap rdf-namespace-prefixes="dc foaf doap"/> , with a whitespace-separated list of namespace prefixes (similar to @extension-element-prefixes in XSLT). This would supplement how to map non-RDF extensions into statements. 8. It may be enough to use elements only when making literal statements, and to use atom:link for resource references. Since atom:link @rel values must be URI:s (or one of the predefined set of names), it seems viable to treat this similarly to RDFa - that is interpret @rel as the predicate URI (i.e. not a qname as in RDFa), and @href as the object URI. Furthermore, one could stipulate that elements within atom:link elements can be interpreted as statements as well, with the value of @href as subject. 9. To put this together, consider this rough sketch: <feed xmlns="..." xmlns:at="..." xmlns:dc="..." xmlns:foaf="..."> <at:entrymap rdf-namespace-prefixes="dc foaf"> <at:subject element="id"/> <at:map property="foaf:name" element="title"/> </at:entrymap> <entry> <id>http://purl.org/NET/dust/foaf#me</id> <title>Niklas Lindström</title> <foaf:givenname>Niklas</foaf:givenname> <summary>About Niklas.</summary> <content src="http://neverspace.net/me.html" type="text/html"/> <link rel="http://www.w3.org/1999/02/22-rdf-syntax-ns#type" href="http://xmlns.com/foaf/0.1/Person"/> <link rel="http://xmlns.com/foaf/0.1/homepage" href="http://neverspace.net/"> <dc:title xml:lang="en">Neverspace</dc:title> </link> </entry> </feed> , yielding the following RDF (in Turle notation): <http://purl.org/NET/dust/foaf#me> a foaf:Person; foaf:name "Niklas Lindström"; foaf:givenname "Niklas"; foaf:homepage <http://neverspace.net/> . <http://neverspace.net/> dc:title "Neverspace"@en . 10. For now, I'd ignore the trouble of atom:category (and you haven't addressed it (yet)). There is lots of debate about what it means: is it a tagging (akin to "skos:category") with the object a concatenation of @scheme and @term; can it determine rdfs:type; is it a variant of "atom:link with custom @rel" where @term *could* be the object URI.. Personally, I think atom:category may mean any of these, and also represent a resource of its own, a "compressed subject-object" kind of thing. This is probably the most haphazardly used element in Atom today. But it works, albeit very differently, for different (vertical) needs. (At some point this or another RFC could define this as well, but it is a lot to handle initially.) ---- In conclusion, I think it is an interesting proposal. As I initially said, I don't really consider embedding RDF directly in Atom a generally useful way from an RDF perspective. But as a bridge between the state of Atom extension formalisms today and the usefulness of RDF, it should be explored. It also formalizes how to make assumptions about how to interpret atom:id from an RDF perspective (in different ways depending on needs), which I find very useful. Best regards, Niklas Lindström
Received on Friday, 4 July 2008 10:08:53 UTC