Re: reconciliation of disparate models - Jon replies from Thomas Baker on 2011-03-13 (public-lld@w3.org from March 2011)

From: Thomas Baker <tbaker@tbaker.de>
Date: Sun, 13 Mar 2011 19:13:40 -0400
To: public-lld@w3.org
Message-ID: <20110313231340.GA2256@octavius>
Context: after Corey wrote...

    I've been thinking a lot about this question of the pros
    and cons of unconstrained, generalized properties, and
    am increasingly convinced that hard-coding domains and
    ranges into things is a significant barrier to reuse. I
    very much like the superclass / generalized superproperty
    approach used in the rda vocabs and suggested by Jeff &
    others on this list.

    One of the things I like about this approach is that it
    *could* have the potential to allow multiple views of
    the same bibliographic data to co-exist without any of
    the underlying assertions contradicting each other.

Jon Phipps questioned the usefulness of blank nodes for LOD
and explained the rationale for open superproperties:

    Just two comments:

    1. I _hate_ blank nodes in public-facing RDF, especially
    RDF intended to be published and consumed as LOD, largely
    because those nodes only provide a system-local identifier
    for the thing being described. This has no utility beyond
    the specific graph that 'contains' them and obfuscates
    the nature of the thing as well. RDF and RDF-based LOD is
    about knowledge transfer and not just data publishing. A
    blank node says I have data about this thing, I can't
    identify it, and you can't make any inferences about it
    beyond the properties I've provided, and neither you nor
    I know what it is. If you know enough about something to
    give it properties, then you know enough to give it an
    identifier, even (especially) if you add a significant
    amount of data to it later.

    2. The notion that somehow there's a cost to instantiating
    an explicitly inferred superproperty when aggregating
    public LOD flies in the face of much of the purpose of RDF
    and its utility in navigating an open world of data where
    the data model presumes that you don't _ever_ have all
    of the available data and you can expand the data you do
    have and dramatically increase interoperability through
    intelligent inferencing guided by the publisher. The
    _point_ of RDF LOD is publication of domain-specific,
    system-specific knowledge in a way that can be consumed
    and _understood_ in the open world of data. So that it
    can be consumed and _understood_ by systems that have no
    other knowledge of the domain supplying the data.

    The RDA RDF vocabularies were designed to enable the
    communication of library data to systems that have no or
    limited understanding of 'library' data with as little
    loss of meaning as possible for systems that might have a
    clue. This is an entirely different purpose than simply
    re-serializing MARC data in a different 'format', and
    is the primary reason for the open superproperties. The
    design is deliberately intended to support the kind of
    recombinant metadata that you're suggesting -- there's no
    reason why systems that have a different notion of WEMI
    or WMI or W(EMI) can't describe their metadata properties
    in a way that makes sense for their system and create a
    relationship to the RDA superproperties that will allow
    consuming systems to exploit that relationship to better
    understand the data. It's about embracing the inevitable
    chaos and working with it in creative and constructive
    ways, rather than trying to legislate it out of existence.

    Jeez, that was more rant than comment, eh?

    But still just my $.03 and I hope maybe helpful.

    Jon

-- 
Tom Baker <tbaker@tbaker.de>
Received on Sunday, 13 March 2011 23:14:21 UTC