- From: Dan Brickley <danbri@w3.org>
- Date: Sat, 7 Sep 2002 15:24:25 -0400 (EDT)
- To: rss-dev <rss-dev@yahoogroups.com>
- cc: <www-archive+rss@w3.org>
Ooops, this turned into a big long msg! Sorry for the verbosity. --danbri short version: - not using namespaces is very expensive (hence RSS 1.0) - inventing our own architecture for combining namespaces is expensive - rss's core is not much more interesting than HTML's <ul><li>... - except that we can decorate it with other XML/RDF vocabularies - RSS benefits from RDF vocabularies designed without RSS in mind, and even without each other in mind - descriptive tasks don't map tidly onto descriptive vocublaries - without something like RDF for principled combination of independent namespaces, the coordination cost of making sure XML vocabs can work together is higher I believe the 'in your face' cost of using RDF syntax is massively outweighed by the hidden costs I outline. eg Flying people to boring meetings to debate how different overlapping XML namespace can be used together is a real, but non-obvious, cost that we risk if we don't adopt some principles for namespace composition and design. RDF is the only set of such principles I've seen proposed for this in the RSS community. "Just use namespaces" doesn't address the problem of one task, multiple namespaces: people, events, music, documents, concerts, prices, locations... If we're interested in applying a variety of descriptive vocabularies to a single task, we'll need to use vocabularies developed outside of RSS-DEV. RDF apps focus on just this, whereas many XML apps focus on a single monolithic DTD or Schema that captures a specific task. Since we're deploying RSS in a general purpose, pluralist, wide-area context, I reckon the namespace-mixing style adopted by RDF is well suited to RSS goals... A rough cut at an motivating scenario is sketched below... On Fri, 6 Sep 2002, Bill Kearney wrote: (hmm, lost original attribution; was this David G.?) > > The problem (perhaps to strong a word) we had at Moreover with implementing Yup, too strong imho. I don't want to seem dismissive about the genuine experiences folk have with the spec: the rdf:Seq table of contents does create extra work. But not much. I'd rather create problems for software developers than for consumers of newsfeeds (it's their job!), and the lack of namespaces in pre-1.0 RSS was a big problem: it forced people to overload the modest representational facilities offered by RSS. RDF was one of the driving forces that motivated the creation of XML Namespaces, and remains to this day (imho :) the best mainstream architecture for deploying, aggregating and merging mixed-namespace XML documents. Sure, you _can_ cut loose from the discipline RDF imposes and deploy a mix of XML namespaces without specifying how they are written and interpreted, but that path leads to TagSoupHell, with each RSS extension vocab created without adopting common representational conventions shared with other extensions. That too can lead to big practical problems: the world doesn't carve up nicely into discrete, separable descriptive tasks. Often you'll want to draw upon several specialise representational vocabularies all at once. Such vocabularies can be designed without consideration for mixing them together, and deployed as mixed-namespace XML. Or they can be designed to share a basic common structure whose principles govern their interactions and mixing. The latter flavour of XML is called RDF. For example, consider an XML document (an RSS feed) combining several namespaces: RSS 1.0 + Dublin Core + Events/iCalendar + Music(Brainz) + FOAF/vCard + Wordnet or TAP KB identifiers + a geographical vocabulary and an ecommerce/pricing vocabulary. This bundle of XML/RDF vocabularies might be used as the namespaces in an RSS feed for (in this example scenario) a music concerts Web site. Folk are doing this (though perhaps not yet in the detail sketched here). Each vocabulary I list above provides a piece of the puzzle: the basic document format (a list of descriptive items) comes from RSS itself. In our example these are documents listing concerts. Dublin Core adds properties that describe those documents (title, subject, dates etc). Other folk are working on systems (eg. Redland/WSE) that consume Dublin Core RDF and make it searchable with tools based on document retrieval. Others are working on more specific ways of representing RDF dc:subject using classifications from thesarui or efforts such as DMoz/OpenDirectory. An events vocabulary (eg. RDF iCalendar, or RSS Events module, with some tweaks) helps us describe the events that those documents describe: when they happen, timezones etc. And to do so in a way that (because other folk are building the tools) can be imported into calendars (Mozilla calendar, iPods, Palms etc). For concerts and music we might want to describe more information about the artists (and perhaps even track listings, for concerts that have happened). So we could make use of a namespace (and database of information) designed for describing artists, tracks etc. Fortunately MusicBrainz.org have done just this. Which is great, it means we don't have to. And because they use not just XML Namepsaces, but RDF's conventions for using XML Namespaces, we can plug their work right in alongside the RSS, the Dublin Core, the event vocabs, the subject taxonomies... We could use the FOAF or vCard RDF/XML vocabs to describe information about the people mentioned in our document; the performers, the contact info for the concert; the homepage or mugshot or insideLegMeasurement of the lead singer. Our (still fictional) RSS concert listing feed could use still other RDF/XML vocabs: eg. performer or band or IDs from the TAP Knowledge base (see http://tap.stanford.edu/tapkb/ http://tap.stanford.edu/cgi-bin/kb.pl), since these provide shareable IDs for many of the things the RSS feed will mention, including places, people, record companies etc. We might also use geographical markup (there are some RDF vocabs in progress for this), or markup to represent ticket prices (@@your namespace here). The list goes on. So what's my verbose point? There are several. Cost is a subtle thing. There are costs associated with trying to squeeze a rich description of something (eg. docs and the concerts/events/bands they describe) into a format not designed for the task. Pre-1.0 RSS was great; but then so were HTML bulleted lists. Trying to share structured information by squeezing it into a list of 3-field records is costly. With RSS 1.0, RDF and the Semantic Web, we are trying something tricky: we want to make it easy to simple things (such as share bulleted lists of new documents), and possible to do very challenging things (such as augment our descriptions of those docs with increasingly specific information about their content: the dates of concerts, the names of bands, the price of tickets... When talking about costs, problems and annoyances with the XML-syntax we're using (XML + Namespaces + RDF) we need to think about the money that will be saved and spent in the world through using these data feeds, as well as the money that will be spent by programmers adding another while() loop into their RSS generation code. There are also costs associated with throwing mixed-namespace XML documents into the Web when the namespaces they draw upon were designed independently without the expectiation of their being (unexpectedly :) combined. Notice how annoying it is that the things we're describing (and the levels of descriptive detail we care about) are all overlapping, there is no simple mapping from descriptive task to stand-alone XML vocabulary. There are lots of namespaces we might use to describe people; some applicable to all people (FOAF/vCard), some especially good at our chosen problem domain (music/performance and musical content), eg. MusicBrainz. Also, MusicBrainz's RDF vocab has other useful content: it describes songs as well as artists. And there are other RDF vocabularies (such as TAP) that can be used to pick out precisely which artist we're talking about, since the creators of TAP took the time to do so. TAP also lists a lot of places (many major cities at least) but doesn't go into huge detail about them. Still further RDF datasources and vocabs do a better job at Geography. Descriptive tasks aren't nicely dividable: 'oh, I want to do a concert listing vocab, I just use the concert-listings DTD' isn't how XML namespaces will work. There'll always be several relevant vocbularies, and which ones are chosen and combined will take some thought. The sceanario above lists some of the raw materials available from RDF vocabularies relevant to the concert-listings scenario. By using them, information could become available to apps designed for other purposes (calendars, address books, document search tools...). This interop isn't guaranteed by using RDF, but it at least becomes possible. Without a design for mixing namespaces, such re-use of data and tools, to my mind, looks a lot less feasible. Just thinking about this simple scenario -- concert listings -- throws up all sorts of problems and opportunities. For those who haven't been engrossed in RDF for years, the RDF 'value added' might not be clear. This mail won't make it clear either, but might flag up some of the concerns that were a priority in RDF's design. With the Resource Description Framework, we have (funnily enough) a Framework for Describing Resources. It is as much a social thing as a technical one: a minimalistic set of conventions for carving up the work of creating XML Namespaces in a way that allows them to be subsequently combined in unexpected ways. The XML Namespaces spec offers no guidance on how each namespace is written. The only spec I know that offers a compelling story about how independent namespaces can be designed for successful combination is RDF. It was built with this goal in mind. That doesn't make it magically effective at solving complex descriptive problems, or make it cheap to produce and consume such data in the Web. But it does help. It helps in several ways: by providing a layer of tools that take namespace mixing and data merging for granted, as a common task for modern Web tools. By allowing the task of creating lots of complimentary XML/RDF vocabs to be parcelled up, and the participants in these efforts to have to do a minmimum of coordination. The vocabs I listed above can all be used to good effect in a single RSS RDF document. But they weren't even designed with RSS in mind, let alone with the other vocabs in mind. RDF was built so that these folks could attend fewer coordination meetings, and get on with the interesting work. If we abandon RDF, and use XML Namespaces with no rules on how our markup is written, we might save some RSS folk some time, but we surely create work elsewhere: the Music Vocab people will have to talk to the Events vocab people who'll have to talk more often to the Dublin Core people. That takes time and money... Dan ps. rest of this msg I might've sent separately; it responds to the specific claim that rdf:Seq creation provides a problematic burden w.r.t. server load, cpu usage etc. > > RSS 1.0 was not to do with outputting the format but the extra step it took > > to add the RDF bag which produced extra load on dynamically created RSS 1.0 > > from a search return (a trivial load but nonetheless greater than that for > > RSS 0.9x). Using permalinks as the equivalent of RDF-about attributes deals > > with syntax at the item level. Those that wish to convert to RDF would have > > to do the secondary process of adding the list items. > > If you want to output in a stream then yes you're iterating over two loops. But > since RSS is supposed to be a limited number of items (~15) wouldn't it be > better to build the items in memory and then stream the parts? No extra DB load > involved. Just build both fragments simultaneously and the build the stream. > > But yes, it does involve additional cycles to do this. How much is arguable > depending on programming style. Do we need a bake-off to compare massively > large dataset performance issues here? If the need to cache a list of ~15 URIs when generating RSS 1.0 is really a performance/resource problem (rather than a minor nuisance for coders who could be focussing on charsets, entities and the other gotchas associated with *any* such XML format), this is a cause for celebration. Either because these other challenges of deploying XML aren't proving too painful, or because demand for the generated RSS 1.0 content is so high that such a minor overhead risks unacceptable server load. If the latter is true, and the server is being pestered for RSS (especially custom-case / personalised RSS that isn't usefully cached), then we've really made it to the big time.
Received on Saturday, 7 September 2002 15:24:26 UTC