Re: Reflections from Dan re: reception and message of RDF from Karen Coyle on 2011-03-01 (public-xg-lld@w3.org from March 2011)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Tue, 01 Mar 2011 07:19:38 -0800
To: Jodi Schneider <jodi.schneider@deri.org>
Cc: public-xg-lld <public-xg-lld@w3.org>
Message-ID: <20110301071938.87502si98s3n5bwq@kcoyle.net>
Good points, Jodi. Another thing this brings up for me is: what do we  
mean by library data?

There's a big difference between the discovery data that libraries  
might/should share and the "other data" that makes a whole host of  
library management activities function. One thing I found missing in  
Dan's piece is the huge gap between the every day "slog" of keeping an  
institution's management database going (personnel, accounting,  
purchasing, inventory, statistics -- more on that later --) and the  
sharing of some data on the open web. For libraries, the sharable part  
is the user discovery layer. The rest of the library data management  
needs to be in a traditional, silo'd database. Some amount of that  
data is considered confidential (by state law) and therefore has to be  
controlled.

A challenge for us in libraries is to have these two very different  
stores of data without creating more work for the library. The two  
stores of data have to work together because at one point they  
represent the same thing, but must do so in very different ways.

Eric Hellmann did a short poll that he presented at Code4Lib that  
showed that there are an average of 2 "techies" per library (and many  
have none), and that a majority of their time is taken up taking care  
of that "slog".

-- statistics: libraries have to show accountability by being able to  
quantify their value. They count users, books on the shelf,  
circulation, and, increasingly, "accesses". One thing not yet  
addressed in our discussions of LLD is how it will help libraries  
*prove* their value.

kc

Quoting Jodi Schneider <jodi.schneider@deri.org>:

> What a wonderful message, so well put!
>
> I see several action items that come from it:
>
> (1) We need to emphasize designing metadata for sharing. There's  
> been a fair amount of research and outreach into shareable metadata  
> (Sarah Shreeves et al at UIUC come to mind). I suggest that our  
> report either point to this work, or, ideally, distill key messages  
> about what does make metadata shareable.
>
> (2) Dan talks about unit tests, to articulate the intended constraints:
>>> Nobody knows when they've been passed a 'good' RDF graph, versus
>>>    one so uninformative, or expressed in such alien terminology, that it
>>>    can't be used for the task at hand.
>
>
> I think that we need to articulate and stress this problem in our  
> report. Dan is pointing to three key unit tests for a graph. That it  
> be:
> -informative
> -compatible with our terminology
> -fit for a purpose
>
> We can, at the very least, express these as the key constraints (any  
> others?) for useful, shareable, and appropriate graphs/metadata.  
> Further, we can point to the technologies that allow us to test  
> compliance with application profiles.
>
> -Jodi
>
> PS-As a side point -- Dan's email reminds me that there is a tension  
> between directing metadata to a community (e.g. peers in other  
> library-related organizations) and directing metadata widely (e.g.  
> distributing it without an audience of "sharers") in mind.
>
> One strength of the RDF approach is that we *can* direct metadata  
> widely as a SIDE EFFECT of directing it to a community -- and that  
> it is very natural to do so. I see this as a fundamental gain, and  
> motivation, for using and distributing Library Linked Data, because  
> sharing our data makes it possible for others to link to it and use  
> it in new ways (without learning ingenious but outdated exchange  
> formats) -- making library metadata more fully a part of the fabric  
> of the Web.
>
> On 1 Mar 2011, at 03:19, Thomas Baker wrote:
>
>> Dear all,
>>
>> In our report, we should consider Dan's eloquent
>> reflections on the reception of RDF (below).
>>
>> For example: "If you're used to XML or SQL schema structures,
>> the schema designer is typically (not necessarily) in a much
>> more authoritative role. With RDFS we stripped a lot of power
>> away from schema designers: they can't tell you what to do
>> any more! There's no "a shipping order *must* have an address"
>> mechanism in RDFS/OWL"...
>>
>> Tom
>>
>>
>> On Mon, Feb 28, 2011 at 03:13:16PM -0500, Thomas Baker wrote:
>>> From:         Thomas Baker <tbaker@TBAKER.DE>
>>> Subject: DanBri about the RDF "message"
>>> To:           DC-ARCHITECTURE@JISCMAIL.AC.UK
>>>
>>> Dear all,
>>>
>>> I'd like to share some insightful comments from Dan Brickley
>>> about what has made the Semantic Web message more difficult
>>> to convey than some of us had expected.
>>>
>>> As the comments were made on a closed list, I have with Dan's
>>> permission removed the context from the excerpts below.
>>>
>>> Tom
>>>
>>>
>>> Dan was asked why it has taken since 1998 to get the world to
>>> understand what can be achieved with URIs and 3-tuple data
>>> representations.  Dan's reply:
>>>
>>>    Part of our problem, I fear is that we have collectively tended to
>>>    approach the situation with an essentially evangelical style.
>>>
>>>    Time and again, this has got smart people interested and intrigued,
>>>    and so they go try out some RDF tools.
>>>
>>>    Very often this is a frustrating experience. And there are good
>>>    technical reasons why working with RDF (* or any other '3-tuple based
>>>    Structured Data Representation' *) will often be frustrating. The
>>>    3-tuple approach thrives in chaotic situations where data flows
>>>    around, with bits missing, bits added, extensions and gaps everywhere.
>>>    This kind of data is intrinsically rather annoying to deal with. There
>>>    are workaround and strategies (details on request :) but that
>>>    frustration is inevitably core to the experience, because it is a set
>>>    of problems the RDF data model was designed to engage with.
>>>
>>>    So http://www.w3.org/DesignIssues/LinkedData.html marked a turning
>>>    point when TimBL took FOAF's RDF linking model, improved it by
>>>    demanding URIs everywhere  (rather than our earlier bNodes and
>>>    seeAlsos), and inspired mass publication of RDF data. Until we had
>>>    data, few were RDF-curious. Now we have data, we can disappoint more
>>>    curious new people per month than ever before. Or on a good day, make
>>>    them happy.
>>>
>>>    The Semantic Web project has delivered several four specific things to
>>>    the world so far: data, tools, community and standards.
>>>
>>>    Because it grew from a standards organization, the tendency has been
>>>    to focus on the standards, and what they do to improve the world - the
>>>    3-tuple model as seen in RDF, and the specs that build on top of it
>>>    (SPARQL, RDFS/OWL etc.).
>>>
>>>    Now standards are great, but they're pretty distant from solving
>>>    day-to-day problems. And there are good reasons to believe that
>>>    3-tuple data structures will typically be annoying to use, as well as
>>>    useful. They only really shine when multiple parties are using them in
>>>    complementary ways, so that data can be usefully mixed and merged and
>>>    extended and overlaid and so forth.
>>>
>>>    So getting those big public, link-friendly datasets out there was a
>>>    foundation for RDFy 3-tuple data becoming more useful than it was
>>>    annoying. But it's still annoying for developers, trust me! Having
>>>    solid standards with test cases (the RDFCore 2004 revision of RDF) was
>>>    a good step forward, but still standards alone are not enough. The
>>>    missing ingredients are tooling and community. Both of which we have,
>>>    both of which we can always benefit from more/better. So communities
>>>    like the RDF/SW interest group at W3C, like Lotico, like the LOD group
>>>    which bridged W3C's scene with the outside world, these help new
>>>    adopters make the most of the 3-tuple model. I've seen quite a few
>>>    efforts burned by mis-applying RDF in contexts where it just wasn't
>>>    important or useful to use it. That's natural with a newish
>>>    technology. And I've seen smart developers frustrated by the lack of
>>>    documentation, polish and guidance around our tooling. But the growing
>>>    suite of RDF-oriented tools can't be ignored, and that's a key part of
>>>    the technology's appeal.
>>>
>>>    We have data, now, and that's enough to attract people. But as seen in
>>>    discussions around eg. data.gov.uk, many mainstream developers see
>>>    RDF, SPARQL and 3-tuples and associated tools as a hurdle or barrier
>>>    that stands between them and data. In a way, they're right. We have
>>>    all these standards and tools as a means to an end (sharing
>>>    information, the Web's founding slogan
>>>    http://www.w3.org/Illustrations/LetsShare.ai.gif "Let's share what we
>>>    know"). RDF is not an end in itself.
>>>
>>>    So imho the message should not be "we've found the best technical
>>>    model for sharing data on a global scale - URI-linked 3-tuples!", but
>>>    rather, that we have a global community committed to sharing data,
>>>    tools, standards and their own experience and time in pursuit of
>>>    solving problems through information linking. This doesn't mean that
>>>    all tools need be opensource, nor all data public, but that there are
>>>    common architectural principles giving coherence to all this data, all
>>>    those tools...
>>>
>>>    All the time we frame this as "RDF is 'easier/better' than
>>>    [wonder-technology X]" we will lose. It's not. And nor is any vaguer
>>>    notion of "3-tuples with URI" [...].  What we have here in
>>>    the Semantic Web effort that is unique is a special combination of
>>>    data, tooling, standards and community that simply can't be found
>>>    anywhere else...
>>>
>>> And to a follow-up question on the exactly what problems people
>>> and developers have with 3-tuples, or what they would rather have
>>> in their place...:
>>>
>>>    I think it's not so much the 'what they get back' (API/format/model),
>>>    but the whole framework of how we structure our data.
>>>
>>>    If you're used to XML or SQL schema structures, the schema designer is
>>>    typically (not necessarily) in a much more authoritative role. With
>>>    RDFS we stripped a lot of power away from schema designers: they can't
>>>    tell you what to do any more! There's no "a shipping order *must* have
>>>    an address" mechanism in RDFS/OWL. For e.g., as editor of the FOAF
>>>    vocab's RDFS I can never say anything in an imperative style in the
>>>    schema, all I can do is define the meaning of the classes and
>>>    properties in the FOAF namespace. Same for the Dublin Core team, for
>>>    SIOC, etc. This permissiveness encourages re-use in lots of different
>>>    ways.
>>>
>>>    This is simultaneously critical for scaling to the Web, but also, as I
>>>    say, annoying to be on the receiving end of. For developers trained in
>>>    the idea that schemas tell you what is or is not an acceptable
>>>    instance, RDF is strangely passive. The only formal way of screwing up
>>>    in RDF is contradicting yourself. Someone could publish a FOAF-based
>>>    RDF/XML document that was simply a collection of triples using
>>>    'foaf:homepage'. Even with bNodes on either side of the property. Or
>>>    someone else might publish a bunch of <foaf:Image about="uri"
>>>    dc:title="...."/> triples. The FOAF vocabulary faciliates this, and
>>>    that is useful, but it also means that knowing the vocabulary is not
>>>    itself enough for interop. You only get interop when a bunch of folk
>>>    do things in roughly the same way; using the same triple patterns.
>>>    There's a whole layer to do with characterising more specific triple
>>>    patterns, 'idioms', that is essentially missing from our collective
>>>    practice. There have been experiments in various directions towards
>>>    characterising such patterns (eg. using SPARQL, see Schemarama...) but
>>>    as a community we seem to act as if schemas are all that's needed.
>>>
>>>    As Ed Dumbill put it (http://times.usefulinc.com/#13:13 via
>>>    http://danbri.org/words/page/27?sioc_type=user&sioc_id=22 )
>>>
>>>    "Processing RDF is therefore a matter of poking around in this graph.
>>>    Once a program has read in some RDF, it has a ball of spaghetti on its
>>>    hands. You may like to think of RDF in the same way as a hashtable
>>>    data structure -- you can stick whatever you want in there, in whatever
>>>    order you want."
>>>
>>>    This loose nature is the key at once to our success and to our
>>>    problems. The analogy is with developers who are used to nice (if a
>>>    little brittle/rigid) OO models are not always happy replacing
>>>    everything with a chaotic hashtable. At least not unless we have a
>>>    good set of unit tests. And what we're missing, by analogy, is just
>>>    that. Nobody knows when they've been passed a 'good' RDF graph, versus
>>>    one so uninformative, or expressed in such alien terminology, that it
>>>    can't be used for the task at hand. So some of the essential ideas
>>>    from non-RDF development just don't really make sense when using
>>>    unconstrained triples. That leads to headaches, frustrations etc.
>>
>> --
>> Thomas Baker <tbaker@tbaker.de>
>>
>>
>
>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Tuesday, 1 March 2011 15:20:18 UTC