- From: Dan Brickley <danbri@danbri.org>
- Date: Thu, 1 Jul 2010 10:46:26 +0200
- To: Pat Hayes <phayes@ihmc.us>
- Cc: Linked Data community <public-lod@w3.org>, Semantic Web <semantic-web@w3.org>
(rejigged subject line) On Thu, Jul 1, 2010 at 4:35 AM, Pat Hayes <phayes@ihmc.us> wrote: >> Pat, I wish you had been there. ;) > > I have very mixed views on this, I have to say. Part of me wanted badly to > be present. But after reading the results of the straw poll, part of me > wants to completely forget about RDF, never think about an ontology or a > logic ever again, and go off and do something completely different, like art > or philosophy. I have mixed feelings about missing the workshop too. Having been pushing this wheelbarrow uphill for far too long, it does seem a shame to have missed such an event. On the other hand, it is hard to know what to make of the workshop outcomes since the participants form an unusually specialist subset of humanity, and the problem of what W3C next does with its RDF standard such a small part of the larger problem. It's clear that many workshop participants were aware of the risk of destabilizing the core technologies just as we are gaining some very promising real-world traction. That was a relief to read. For those who have invested time and money in helping us get this far, and who had the resources to participate, this concern was probably enough to motivate participation. It's clear also that participants were aware of many of the little annoyances that bring friction and frustration to those working with RDF. What I'm less sure of is how to represent the perspective of those who have explored RDF and walked away. Over the years, many bright people have investigated RDF enthusiastically, and left disappointed. Those folk didn't come to the workshop, they didn't write a position paper, and they probably don't particularly care about its outcomes. But they're just the kind of people who will need to enjoy using RDF if we are to succeed. Is RDF hard to work with? I think the answer remains 'yes', but we lack consensus on why. And it seems even somehow disloyal to admit it. If I had to list reasons, I'd leave nits like 'subjects as literals' pretty low down. Many of the reasons I think are anavoidable, and intrinsic to the kind of technology and problems we're dealing with. But there are also lots of areas for improvement. Most of these are nothing to do with fixups to W3C standards documentation. And finally, we can lesson the perception of pain by improving the other side: getting more decent linked data out there, so the suffering people go through is "worth it". Some reasons why RDF is annoying and hard (a mildly ordered list): * RDF data is gappy, chaotic, full of unexpected extensions and omissions - BY DESIGN * RDF toolkits each offer different items from a large menu (syntaxes, storage, inference facilities), so even when you're getting a lot, you probably don't appreciate what you're getting and we have no common checklist that help non-guru developers understand this. * RDF toolkit / library immaturity; eg1. I wasted half a weekend recently trying to find a decent Javascript system. eg2. I work in Python using the popular rdflib library, whose half-finished SPARQL support was recently removed and put into an 'extras' package; nobody seems quite sure how well it works. The Ruby landscape remains messy although the public-rdf-ruby list have recently been collaborating actively to improve things. Broken old and abandoned code litters the Web; good stuff remains on the bleeding edge and unpackaged. Great ideas, code and algorithms remain trapped in a single implementation language rather than transliterated to other widely deployed languages. Almost every toolkit's SQL backend is represented differently. Only a few serializers bother to prettify RDF/XML nicely, despite there being opensource code out there that could easily be copied. * RDF is good for aggregation of externally managed data; managing data *as* RDF comes with certain complexities since edit/delete operations on a connected graph aren't as intuitive as on a closed tree structure. If I delete a certain node from the graph, which others should be cleaned up too? Named graphs help somewhat there but good habits aren't yet understood, much less documented. * As a community, we have some standards for documenting the atomic terms in our vocabularies (ie. RDFS/OWL) but we tend to stop there, and not to document the larger graph patterns that are needed to really communicate using these structures, or the underlying use cases that motivated them in the first place. We also don't do nearly enough analytics and stats over the actual data out there to make it easier to consume, and for publishers to gravitate towards existing idioms rather than make up similar-but-different graph patterns that'll confuse the landscape further. * Our small community (we are outnumbered by Visual Basic enthusiasts, let alone Javascripters) is fragmented and grumpy. OWL and Linked Data enthusiasts too often talk and think disparagingly about each others' work, or not-so-secretly wish the others would just go away and stop messing things up. And all this foolish posturing despite the fact that Linked Data is a massive deployment of OWL-documented vocabularies, and that the essential but annoying gappy chaotic nature of RDF can at least partly be patched up by techniques that help us figure out when two different RDF expressions are saying the same thing, aka inference. * Enthusiasm sometimes borders on a religious zeal that would be better spent on toolkit polish than on overloading mailing lists; or on prolonging petty wars ('x is not semantic enough', 'y isn't really Linked Data'...) with other folk who prefer for whatever reason to use different technologies to publish, share and link data. So, what do we do? A few years ago, Edd Dumbill turned the XML Europe conference into the XTech conference, transforming it from a nose-too-close-to-the-screen event for markup nerds, into an event that brought together browser people, XML markup experts, open data advocates (creative commons etc.), and forward-thinking creative technologists of every kind. XTech is no longer with us, although I expect Edd's work at OSCON (http://www.oscon.com/oscon2010 which I'd happily have attended over any RDF/SemWeb event) shows similar insight. XTech was important as it provided a meeting place for technologists with different technical favourites, while also tapping into the larger themes that motivate much of the passion in the first place. It helped people identify themselves with a larger effort, rather than with some specific technology tool. I think we can learn a lot from XTech. RDF enthusiasts share 99.9% of their geek DNA with the microformats community, with XML experts, with OWL people, ... but time and again end up nitpicking on embarrassing details. Someone "isn't really" publishing Linked Data because their RDF doesn't have enough URIs in it, or they use unfashionable URI schemes. Or their Apache Web server isn't sending 303 redirects. Or they've used a plain XML language or other standard instead. This kind of partisan hectoring can shrink a community passionate about sharing data in the Web, just at a time when this effort should be growing more inclusive and taking a broader view of what we're trying to achieve. The formats and protocols are a detail. They'll evolve over time. If people do stuff that doesn't work, they'll find out and do other things instead. The thing that keeps me involved is the common passion for sharing information in the Web. If we keep that as an anchor point rather than some flavour of some version of RDF, I think a lot of the rest falls into place. I love http://www.w3.org/Illustrations/LetsShare.ai.gif "Let's Share What We Know" - an ancient slogan of the early Web project. If we take "Let's share what we know" as a central anchor, rather than triples, we can evaluate different technical strategies in terms of whether they help by making it easier to "share what we know" using the Web. Going back to my list, I think the reason to use RDF will simply be that others have also chosen to use it. Nothing more really, it's about the data, above all. Sure the reason we can all choose to use it and gain value from each others' parallel decision, is the emphasis on linking, sharing, mixing, decentralisation. But when choosing whether to bother with RDF, I think for future decision makers it'll all be about the data not the implementation techniques. The reason is *not* the tooling, the fabulous parsers, awe-inspiring inference engines, expressive query languages or cleverly designed syntaxes. Those are all means-to-an-end, which is sharing information about the world. Or getting hold of cheap/free and bulky background datasets, if you prefer to couch it in less idealistic terms. And why would anyone care to get all this semi-related, messy Web data? Because problems don't come nicely scoped and packaged into cleanly distinct domains. Whenever you try to solve one problem, it borders on a dozen others that are a higher priority for people elsewhere. You think you're working with 'events' data but find yourself with information describing musicians; you think you're describing musicians, but find yourself describing digital images; you think you're describing digital images, but find yourself describing geographic locations; you think you're building a database of geographic locations, and find yourself modeling the opening hours of the businesses based at those locations. To a poet or idealist, these interconnections might be beautiful or inspiring; to a project manager or product manager, they are as likely to be terrifying. Any practical project at some point needs to be able to say "Enough with all this interwingularity! this is our bit of the problem space, and forget the rest for now". In those terms, a linked Web of RDF data provides a kind of safety valve. By dropping in identifiers that link to a big pile of other people's data, we can hopefully make it easier to keep projects nicely scoped without needlessly restricting future functionality. An events database can remain an events database, but use identifiers for artists and performers, making it possible to filter events by properties of those participants. A database of places can be only a link or two away from records describing the opening hours or business offerings of the things at those places. Linked Data (and for that matter FOAF...) is fundamentally a story about information sharing, rather than about triples. Some information is in RDF triples; but lots more is in documents, videos, spreadsheets, custom formats, or [hence FOAF] in people's heads. Looked at in these terms, my RDF wishlist would be based on looking at things from the consumer side. Publishing RDF is fiddly, but do-able. And it only takes a few lines of [perl|java|ruby|prolog|xslt...] to expose massive amounts of information in the Web. The linked data scene in recent years has started to do just this on an impressive scale. But consuming RDF remains pretty annoying, a hurdle to be crossed to get at the good stuff, the data. Even while RDF remains our single best story for how such a Web of data can be broken down into a largely uncoordinated global division of labour, RDF itself remains ... annoying. So my RDF wishlist would be about making RDF less annoying or risky to consume. While a lot of that is about tool maturity, there is a lot around data licensing and dealing with a natural waryness of depending too much on others, and on making sure RDF and the 'linked data' idea is presented in a more inclusive manner that respects that fact that most of the world's information isn't going to gain much from being put into URI-based triples. The very nature of RDF makes it somewhat annoying to work with. RDF data is always going to be a kind of frankenstein's data monster, patched together from bits and pieces that can just about be made to fit together. Fortunately, we have at our fingertips a world wide Web that lets us share an awful lot of these bits; the more we can get re-usable RDF datasets out there, the less people will worry about the pain of using it, and the more likely it'll be that there will be genuinely useful, relevant data on hand when someone goes looking for it. All the time we run around evangelizing RDF while not admitting that it is also kind of annoying, we raise expectations that will be dashed when people actually try using it. All the time we spend ages writing long emails when we could be fixing and improving RDF software or datasets, we're probably also prolonging the problem. For my part in that last one, and for this over long mail, ... sorry :) cheers, Dan
Received on Thursday, 1 July 2010 08:47:03 UTC