Reflections from Dan re: reception and message of RDF from Thomas Baker on 2011-03-01 (public-xg-lld@w3.org from March 2011)

From: Thomas Baker <tbaker@tbaker.de>
Date: Mon, 28 Feb 2011 22:19:52 -0500
To: public-xg-lld <public-xg-lld@w3.org>
Message-ID: <20110301031951.GA5180@octavius>
Dear all,

In our report, we should consider Dan's eloquent
reflections on the reception of RDF (below).

For example: "If you're used to XML or SQL schema structures,
the schema designer is typically (not necessarily) in a much
more authoritative role. With RDFS we stripped a lot of power
away from schema designers: they can't tell you what to do
any more! There's no "a shipping order *must* have an address"
mechanism in RDFS/OWL"...

Tom


On Mon, Feb 28, 2011 at 03:13:16PM -0500, Thomas Baker wrote:
> From:         Thomas Baker <tbaker@TBAKER.DE>
> Subject: DanBri about the RDF "message"
> To:           DC-ARCHITECTURE@JISCMAIL.AC.UK
> 
> Dear all,
> 
> I'd like to share some insightful comments from Dan Brickley
> about what has made the Semantic Web message more difficult
> to convey than some of us had expected.
> 
> As the comments were made on a closed list, I have with Dan's
> permission removed the context from the excerpts below.
> 
> Tom
> 
> 
> Dan was asked why it has taken since 1998 to get the world to
> understand what can be achieved with URIs and 3-tuple data
> representations.  Dan's reply:
> 
>     Part of our problem, I fear is that we have collectively tended to
>     approach the situation with an essentially evangelical style.
> 
>     Time and again, this has got smart people interested and intrigued,
>     and so they go try out some RDF tools.
> 
>     Very often this is a frustrating experience. And there are good
>     technical reasons why working with RDF (* or any other '3-tuple based
>     Structured Data Representation' *) will often be frustrating. The
>     3-tuple approach thrives in chaotic situations where data flows
>     around, with bits missing, bits added, extensions and gaps everywhere.
>     This kind of data is intrinsically rather annoying to deal with. There
>     are workaround and strategies (details on request :) but that
>     frustration is inevitably core to the experience, because it is a set
>     of problems the RDF data model was designed to engage with.
> 
>     So http://www.w3.org/DesignIssues/LinkedData.html marked a turning
>     point when TimBL took FOAF's RDF linking model, improved it by
>     demanding URIs everywhere  (rather than our earlier bNodes and
>     seeAlsos), and inspired mass publication of RDF data. Until we had
>     data, few were RDF-curious. Now we have data, we can disappoint more
>     curious new people per month than ever before. Or on a good day, make
>     them happy.
> 
>     The Semantic Web project has delivered several four specific things to
>     the world so far: data, tools, community and standards.
> 
>     Because it grew from a standards organization, the tendency has been
>     to focus on the standards, and what they do to improve the world - the
>     3-tuple model as seen in RDF, and the specs that build on top of it
>     (SPARQL, RDFS/OWL etc.).
> 
>     Now standards are great, but they're pretty distant from solving
>     day-to-day problems. And there are good reasons to believe that
>     3-tuple data structures will typically be annoying to use, as well as
>     useful. They only really shine when multiple parties are using them in
>     complementary ways, so that data can be usefully mixed and merged and
>     extended and overlaid and so forth.
> 
>     So getting those big public, link-friendly datasets out there was a
>     foundation for RDFy 3-tuple data becoming more useful than it was
>     annoying. But it's still annoying for developers, trust me! Having
>     solid standards with test cases (the RDFCore 2004 revision of RDF) was
>     a good step forward, but still standards alone are not enough. The
>     missing ingredients are tooling and community. Both of which we have,
>     both of which we can always benefit from more/better. So communities
>     like the RDF/SW interest group at W3C, like Lotico, like the LOD group
>     which bridged W3C's scene with the outside world, these help new
>     adopters make the most of the 3-tuple model. I've seen quite a few
>     efforts burned by mis-applying RDF in contexts where it just wasn't
>     important or useful to use it. That's natural with a newish
>     technology. And I've seen smart developers frustrated by the lack of
>     documentation, polish and guidance around our tooling. But the growing
>     suite of RDF-oriented tools can't be ignored, and that's a key part of
>     the technology's appeal.
> 
>     We have data, now, and that's enough to attract people. But as seen in
>     discussions around eg. data.gov.uk, many mainstream developers see
>     RDF, SPARQL and 3-tuples and associated tools as a hurdle or barrier
>     that stands between them and data. In a way, they're right. We have
>     all these standards and tools as a means to an end (sharing
>     information, the Web's founding slogan
>     http://www.w3.org/Illustrations/LetsShare.ai.gif "Let's share what we
>     know"). RDF is not an end in itself.
> 
>     So imho the message should not be "we've found the best technical
>     model for sharing data on a global scale - URI-linked 3-tuples!", but
>     rather, that we have a global community committed to sharing data,
>     tools, standards and their own experience and time in pursuit of
>     solving problems through information linking. This doesn't mean that
>     all tools need be opensource, nor all data public, but that there are
>     common architectural principles giving coherence to all this data, all
>     those tools...
> 
>     All the time we frame this as "RDF is 'easier/better' than
>     [wonder-technology X]" we will lose. It's not. And nor is any vaguer
>     notion of "3-tuples with URI" [...].  What we have here in
>     the Semantic Web effort that is unique is a special combination of
>     data, tooling, standards and community that simply can't be found
>     anywhere else...
> 
> And to a follow-up question on the exactly what problems people
> and developers have with 3-tuples, or what they would rather have
> in their place...:
> 
>     I think it's not so much the 'what they get back' (API/format/model),
>     but the whole framework of how we structure our data.
> 
>     If you're used to XML or SQL schema structures, the schema designer is
>     typically (not necessarily) in a much more authoritative role. With
>     RDFS we stripped a lot of power away from schema designers: they can't
>     tell you what to do any more! There's no "a shipping order *must* have
>     an address" mechanism in RDFS/OWL. For e.g., as editor of the FOAF
>     vocab's RDFS I can never say anything in an imperative style in the
>     schema, all I can do is define the meaning of the classes and
>     properties in the FOAF namespace. Same for the Dublin Core team, for
>     SIOC, etc. This permissiveness encourages re-use in lots of different
>     ways.
> 
>     This is simultaneously critical for scaling to the Web, but also, as I
>     say, annoying to be on the receiving end of. For developers trained in
>     the idea that schemas tell you what is or is not an acceptable
>     instance, RDF is strangely passive. The only formal way of screwing up
>     in RDF is contradicting yourself. Someone could publish a FOAF-based
>     RDF/XML document that was simply a collection of triples using
>     'foaf:homepage'. Even with bNodes on either side of the property. Or
>     someone else might publish a bunch of <foaf:Image about="uri"
>     dc:title="...."/> triples. The FOAF vocabulary faciliates this, and
>     that is useful, but it also means that knowing the vocabulary is not
>     itself enough for interop. You only get interop when a bunch of folk
>     do things in roughly the same way; using the same triple patterns.
>     There's a whole layer to do with characterising more specific triple
>     patterns, 'idioms', that is essentially missing from our collective
>     practice. There have been experiments in various directions towards
>     characterising such patterns (eg. using SPARQL, see Schemarama...) but
>     as a community we seem to act as if schemas are all that's needed.
> 
>     As Ed Dumbill put it (http://times.usefulinc.com/#13:13 via
>     http://danbri.org/words/page/27?sioc_type=user&sioc_id=22 )
> 
>     "Processing RDF is therefore a matter of poking around in this graph.
>     Once a program has read in some RDF, it has a ball of spaghetti on its
>     hands. You may like to think of RDF in the same way as a hashtable
>     data structure -- you can stick whatever you want in there, in whatever
>     order you want."
> 
>     This loose nature is the key at once to our success and to our
>     problems. The analogy is with developers who are used to nice (if a
>     little brittle/rigid) OO models are not always happy replacing
>     everything with a chaotic hashtable. At least not unless we have a
>     good set of unit tests. And what we're missing, by analogy, is just
>     that. Nobody knows when they've been passed a 'good' RDF graph, versus
>     one so uninformative, or expressed in such alien terminology, that it
>     can't be used for the task at hand. So some of the essential ideas
>     from non-RDF development just don't really make sense when using
>     unconstrained triples. That leads to headaches, frustrations etc.

-- 
Thomas Baker <tbaker@tbaker.de>
Received on Tuesday, 1 March 2011 03:20:31 UTC