RE: Review of Relevant Technologies section from Young,Jeff (OR) on 2011-06-28 (public-lld@w3.org from June 2011)

From: Young,Jeff (OR) <jyoung@oclc.org>
Date: Tue, 28 Jun 2011 11:38:26 -0400
To: "Jon Phipps" <jonp@jesandco.org>, <tom@tombaker.org>, "public-lld" <public-lld@w3.org>
Cc: "Antoine Isaac" <aisaac@few.vu.nl>, <emmanuelle.bermes@bnf.fr>
Message-ID: <52E301F960B30049ADEFBCCF1CCAEF590CF62296@OAEXCH4SERVER.oa.oclc.org>
Jon,

Sorry for the expansive reply, but you raised some broad issues. Comments are below:

> 1. A great deal of time has been spent gathering and organizing
> relevant Use Cases and it seems to me that this section in particular
> should attempt to define 'relevance' in terms that would show the
> relevance of the available technologies to the specific categories of
> Use Cases, either by organization or by direct reference in the text.

I tried to add some cross-references to specific categories of use cases, but the affinities seem a little vague. Nevertheless, here's the diff:

http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Relevant_Technologies&diff=5188&oldid=5177 

My impression of http://www.w3.org/2005/Incubator/lld/wiki/UseCases is that the categories reflect nebulous domain model boundaries. The fact that they roll-up category-specific scenarios and vocabularies seems to support this observation. It's true that these categories also roll-up relevant technologies, but there hasn't been as much effort to refine them:

http://www.w3.org/2005/Incubator/lld/wiki/Cluster_BibData#Vocabularies_and_Technologies

http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Authority_data#Relevant_technologies

http://www.w3.org/2005/Incubator/lld/wiki/Cluster_VocAlign#Relevant_technologies

http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Archives#Vocabularies_and_Technologies

http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Citations#Vocabularies_and_Technologies

http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Digital_Objects#Relevant_technologies

http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Collections#Vocabularies_and_Technologies

http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Social_Uses#Relevant_technologies


I parsed each section, excluded the things that are vocabularies/domain models, and came up with this list (and estimated links to the Relevant Technologies section in which it would presumably fall). Many of these are quite general and their appearance in the use case cluster seems to reflect the diversity of contributors more than the technology's peculiar affinity to the cluster. Suggestions for specific refinements without enumeration would be welcome.

<http://www.w3.org/2005/Incubator/lld/wiki/Draft_Relevant_Technologies#Discrete_and_bulk_access_to_information> 
URIs/URLs (3) 
PURL
DOI
Handle
ARK
RDF or RDF/XML (2)
HTML
JSON
REST
RSS, Atom
OpenURL
oCoins
OAI-PMH
OAI-ORE

<http://www.w3.org/2005/Incubator/lld/wiki/Draft_Relevant_Technologies#Linked_Data_front-ends_to_existing_data_stores> 
SPARQL (5)
Semantic Searching
R2R mapping framework

<http://www.w3.org/2005/Incubator/lld/wiki/Draft_Relevant_Technologies#Tools_for_data_designers>
Metadata Registry
FinnONTO, NeON, CATCH, Bioportal
Vocabulary Mapping Framework

<http://www.w3.org/2005/Incubator/lld/wiki/Draft_Relevant_Technologies#SKOS_and_related_tools>
Vocabulary Mapping Framework

<http://www.w3.org/2005/Incubator/lld/wiki/Draft_Relevant_Technologies#Web_Application_Frameworks> 
Frameworks (2) 
Triple Stores (2)
Django

<http://www.w3.org/2005/Incubator/lld/wiki/Draft_Relevant_Technologies#Content_Management_Systems> 
CMS (2)
Semantic wikitext (2)
Fedora

<http://www.w3.org/2005/Incubator/lld/wiki/Draft_Relevant_Technologies#Web_Services_for_Library_Linked_Data>
Google Maps

Miscellaneous, theoretical or unfamiliar:
FRBRization process
SILK
Sameas.org
SPAR (Scalable Preservation and Archiving System)
Preserv2
Datacite
Graph/network visualization tools
OCLC WorldCat
Scotland's Information: Nearest Public Library Service
lifestreaming


> 2. There's considerable confusion regarding RDF and Linked Data, often
> treating the two technologies as synonymous. Although they share some
> features, such as the centrality of URIs to the technology, Linked Data
> doesn't require RDF either as transport, storage or data model. 

I assume the general meaning of "Linked Data" is captured in TimBL's 4 principles: http://www.w3.org/DesignIssues/LinkedData.html. I think LLD XG also accepts the general principles of his 5 star rating. It's probably true that the Relevant Technologies page leans heavily towards 4+ stars, but that is arguably necessary to make library data interoperable in other domains.

> I agree
> that there are distinct advantages to using the RDF data model to
> express LLD in the transport layer and for aggregation, but this
> shouldn't be promoted to the level of a requirement, nor should the
> advantages of RDF be presented as the same advantages of Linked Data.

The Relevant Technologies section specifically doesn't claim or assume that RDF is a requirement. OTOH, I don’t think that closed-vocabulary/closed-syntax/domain-specific 1-2 star solutions should make the cut.
 
Taking a step back, I don't think the "RDF data model" is the real bottleneck. The real bottleneck is lack of modeling skills and tools to design/automate those models. UML/ODM is pretty good at approximating/representing domain models visually, but HTTP/RDF/RDFS/OWL (i.e. Linked Data) is where the rubber currently meets the road on a global cross-domain network-interoperability (so-called "Web-") scale level.

> This happens throughout the report, not just in this section, and it
> would be worthwhile to review all of the sections for clarity with
> respect to this difference. If it's the recommendation of this group
> that LLD be restricted to RDF (as the definition of LLD in the 'scope'
> section states), then that needs to be an explicit and clearly defined
> prescription and its implications taken into consideration.

I believe you are referring to this specific quote in a different section of the report:

 Library Linked Data. "Library Linked Data" (LLD) is any type 
 of library data that is either natively maintained, or merely 
 exposed, in the form of RDF triples, thus facilitating linking.
 <http://www.w3.org/2005/Incubator/lld/wiki/Benefits#.22Library_Linked_Data.22:_Scope_of_this_report> 

This statement does couple "Library Linked Data" to 4+ stars in TimBL's Linked Data rating system. I wasn't involved with the editing of that section, but I'm inclined to believe that library linked data should aspire to more than publishing MARC as CSV (i.e. a mere 3 stars).

> 3. Related to #2, it would be useful to be explicit about the hierarchy
> of technologies supported by Linked Data as spelled out in TBL's
> 'principles' document and his '5-star-rating' compliance as it directly
> addresses technologic relevance. Some specifics:
> • Discrete and bulk access to information
> o Cool URIs have 2 requirements: 1) be on the web (http) and 2) be
> unambiguous (one URI can't stand for both a document and a real-world
> object).  This is quite different from "raw RDF can be easily and
> automatically negotiated and rendered into an HTML format for human
> (browser) consumption". Linked data's distinction is its narrowing of
> semweb focus to Cool URI requirement #1 and its divergence from some of
> the more stringent Artificial Intelligence-driven requirements of
> semweb.

Your definition of "Cool URIs" seems to be based on http://www.w3.org/Provider/Style/URI, whereas LLD XG generally assumes its more modern form at http://www.w3.org/TR/cooluris/. In the latter document, delivery of RDF is assumed. I promise that nobody here cares about artificial intelligence. What matters is the ability to express and consume data using shared vocabularies and syntax.

> o "... the atomic nature of Linked Data http URIs makes it impractical
> for high volume network access" is arguably as untrue as saying that
> http URIs in general are impractical for high volume network access. It
> also conflicts with the value of http URIs described in the
> introductory paragraph to this section.

I rephrased the point. Here's the diff:

http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Relevant_Technologies&diff=5191&oldid=5188


> o This paragraph doesn't effectively describe the technologic relevance
> of either discrete or bulk access to library data.

I reworded "in bulk" to "as RDF dumps" and added a hotlink to the meaning. Here's the diff:

http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Relevant_Technologies&diff=5194&oldid=5193


> • Linked Data front-ends
> o "... typical XML documents". In the context of library linked data a
> better comparison might be made to MARC21

But that would undermine the relevant point that typical XML also sucks in terms of interoperable vocabularies and syntax. ;-)

> o What does "mash up" mean?

I added a hotlink to provide some context. Here's the diff:

http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Relevant_Technologies&diff=5193&oldid=5191


> o This paragraph is an instance of RDF/Linked Data confusion and seems
> to be substituting Linked Data for RDF

I retitled the section to accommodate both. Here's the diff:

http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Relevant_Technologies&diff=5195&oldid=5194


> o There's no mention of the integration of RDF layers on existing data
> stores, like Oracle and DB2 and a simple Google Scholar search
> (http://bit.ly/mjBMQl) invites distillation into a few sentences a
> discussion of RDF and 'traditional' data stores.

This section doesn't explain how the front ends do their mapping from traditional data stores to Linked Data/RDF, but it does provide links to tools that do it. 

I'm not familiar with Oracle and DB2 specifically, but the D2R Server is mentioned that provides this capability for relational databases in general. The W3C's RDB2RDF standardization effort is also mentioned. 

The link to Google Scholar lists documents from the early to mid 2000s, mostly before TimBL's Linked Data principles and W3C's "Cool URIs for the Semantic Web" put RDF back on the map.

> • Tools for data designers
> o There's no mention of UML and its relationship to data modeling for
> RDF, or the best practices of the Dublin Core Singapore Framework and
> the Dublin Core Abstract Model's integration of RDF-friendly domain
> modeling.

I agree that UML is important and should be included. Here's the diff:

http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Relevant_Technologies&diff=5197&oldid=5195


The specification for Dublin Core Application Profiles has been around for a few years, but examples still seem to be in short supply. Regarding DCAM, here's a quote from a recent paper you and others in the group co-authored:

"DCMI hasn't been explicit about associating the DC Abstract Model (DCAM) with OWL, although it is explicit about defining some very similar constraints."
<http://www.dlib.org/dlib/january10/hillmann/01hillmann.html>

If DCAM isn't reconciled with OWL, then my opinion is that it will continue to create confusion and eventually end up as irrelevant.

> o This is the first mention of "domain-specific vocabularies" that
> elsewhere appear to be described as "metadata element sets". This
> should be made consistent or a clearer distinction made.

Agreed. Here's the diff:

http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Relevant_Technologies&diff=5198&oldid=5197


> o OWL could as easily be seen as an impediment to the production of
> RDF. Rather than promoting OWL it might be better to provide a more
> neutral and balanced description of the technology, specific to library
> linked data in RDF format.

http://www.w3.org/2005/Incubator/lld/wiki/Draft_Relevant_Technologies#Tools_for_data_designers gives specific reasons why OWL is beneficial. Can you give a specific example of how OWL might be an impediment to the production of RDF? Are you thinking that library linked data should focus on DCAM instead of OWL because it is specific to our domain?

> o This paragraph is also an instance of RDF/Linked Data confusion

It's not clear which paragraph you're referring to, so I won't try to guess.

Thanks for the useful comments. Hopefully the changes and explanations are reasonable. If not, follow-ups are welcome.

Jeff

> At this point I have more than run out of time to continue, I'm already
> very late with this review, and I'm not at all sure that the level of
> detailed review in the sections above is what's required. If you'd like
> me to continue with that level of detail for the rest of the section, I
> may be able to provide that before the next meeting. But I'm attending
> ALA Annual for the next week and my dance card is quite full. Hopefully
> this represents a useful start.
> --
> Jon Phipps
Received on Tuesday, 28 June 2011 15:39:15 UTC