RE: Records in (L)LD? from Ford, Kevin on 2011-01-20 (public-lld@w3.org from January 2011)

From: Ford, Kevin <kefo@loc.gov>
Date: Thu, 20 Jan 2011 15:40:39 -0500
To: Antoine Isaac <aisaac@few.vu.nl>, public-lld <public-lld@w3.org>
Message-ID: <1D525027B29706438707F336D75A279F152C4FD05F@LCXCLMB03.LCDS.LOC.GOV>
"I wonder whether there could be something in the existing library data exchange practices, which could be adapted to the LD world. I'd be interested to hear opinions on this!"

I'd argue that this is one of the primary questions this group should be asking.

I don't know, however, if I'm qualified to answer this specific question in full but library data exchange practices generally include all the information pertaining to a resource to be independently understandable (or "wholeness" / "complete description" as Karen also put it [1]).  In other words, the entire "record" is included.  Now, we could probably debate to the ends of the earth the definition of "independently understandable," but let it be sufficient to point out that a bibliographic record contains the subject headings as strings (an identifier for that heading would be nice too, but this is currently not the custom) not the entire authority record (fwiw, in some ways I'd rather have an identifier than a string; I'd prefer both).

On a related note, I actually encountered the issue of resource description "completeness" / "wholeness" with the SKOS data at ID.  And, having quickly familiarized myself with a Concise Bounded Description [2], I appear to have accidentally reached the same or similar conclusion as the CBD authors.  I also seem to have implemented CBD in practice.

For one of the vocabularies at ID (MARC Languages), we had to create narrow relations to concepts that had to be identified with blank nodes [3].  A SPARQL DESCRIBE query returned a blank node ID, but stopped there.  Clearly the Concept had a narrow relation to a resource, but one could not "see" or otherwise access that resource without bulk downloading the data (a SPARQL interface is not possible at this time). Unhappy with the results from the SPARQL DESCRIBE query, I crafted a more tailored response.  Now, basically conforming to CBD, the user/machine receives a type of subgraph, which includes all the statements of the subject and the subgraph includes the RDF types, skos:prefLabels, and skos:altLabels of the subject's relations.  "Earth" from the GeographicAreas Concept Scheme at ID is another example [4].

The server must work harder to produce this information, but I feel it is infinitely more useable not only by a human programmer, but a machine seeking information.  It seemed essential given the blank nodes.

Now, should the merits of (or against) such an approach be formalized, debated, etc?

Personally, I would like to see these types of communication standards formalized.  This way, when going from LLD site to LLD, there is a known and predictable exchange protocol. 

Nonetheless, I agree with Antoine and I believe him to be technically correct.  For LD, "it would be sufficient to stop at the first de-referenceable identifier you bump into."  But, I'm finding that more information per resource is required for it to be truly useable.  For blank nodes, I've found that providing more information is imperative.

Warmly,

Kevin

p.s.  FWIW, Tom actually touched on this notion of bounded records and subgraphs briefly [5].

[1] http://lists.w3.org/Archives/Public/public-xg-lld/2011Jan/0084.html
[2] http://www.w3.org/Submission/CBD/
[3] http://id.loc.gov/vocabulary/languages/ita.rdf
[4] http://id.loc.gov/vocabulary/geographicAreas/x.rdf
[5] http://lists.w3.org/Archives/Public/public-xg-lld/2011Jan/0085.html


________________________________________
From: public-lld-request@w3.org [public-lld-request@w3.org] On Behalf Of Antoine Isaac [aisaac@few.vu.nl]
Sent: Thursday, January 20, 2011 13:26
To: public-lld
Subject: Records in (L)LD?

Hi Gordon,

Re. your questioning on the granularity of library metadata [1], and the notion of record, here's a link to one approach for data packaging in the LD realm: Concise Bounded Description (CBD, [2]).

The aim is to determine what data to send back for a given LD identifier, as "a general and broadly optimal unit of specific knowledge about that resource to be utilized by, and/or interchanged between, semantic web agents".
I trust this somehow matches the concerns that guide the creation of records in library. I'm thinking for example when you decide how much of one concept's description you should include in the description of the book that has this concept as subject. Very often you would have both identifier and label for that concept in the book record, I think. While only the identifier would be needed, in principle.

In theory, in the LD context, it would be sufficient to stop at the first de-referenceable identifier you bump into into when you navigate the graph of subject-predicate-object statements that "starts" at the resource of interest. Then, you can just fire a query for getting the description of this new identifier. This is the idea of CBD.

Yet CBD is only one of the algorithms that can be used to establish the limit of data to send for an LD identifier. It's mentioned as one solution for SPARQL DESCRIBE query results [3], but that spec is not trying to enforce any option, truly:
[
This data [...] is determined by the SPARQL query processor. The DESCRIBE form takes each of the resources identified in a solution, together with any resources directly named by IRI, and assembles a single RDF graph by taking a "description" which can come from any information available including the target RDF Dataset.
]

One explanation for this is leaving the door open to optimizing data communication. For example, when you know that an data consuming application interested in one resource (e.g., a book) will almost always be interested in details for its related resources (e.g., its subject). In such cases, it helps if the packages of data shipped by the LD service have a slightly wider scope.

In fact the SPARQL doc even continues with a library example :-)
[
It may include information about other resources: for example, the RDF data for a book may also include details about the author.
]

I wonder whether there could be something in the existing library data exchange practices, which could be adapted to the LD world. I'd be interested to hear opinions on this!

Best,

Antoine

[1] http://www.w3.org/2005/Incubator/lld/wiki/Granularity_of_library_metadata
[2] http://www.w3.org/Submission/CBD/
[3]http://www.w3.org/TR/rdf-sparql-query/#describe
Received on Thursday, 20 January 2011 20:39:52 UTC