Handling remaining comments on the former "Available Vocabularies and Datasets" section from Antoine Isaac on 2011-08-25 (public-xg-lld@w3.org from August 2011)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Thu, 25 Aug 2011 15:22:22 +0200
To: public-xg-lld <public-xg-lld@w3.org>
Message-ID: <4E564C8E.3050001@few.vu.nl>

Hi,

We have tried to address remaining comments from http://www.w3.org/2005/Incubator/lld/wiki/DraftReportReviews#Draft_Vocabularies_Datasets_Section , as detailed below.

Raw diffs are
http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Vocabularies_Datasets_Section2&diff=5882&oldid=5848
http://www.w3.org/2005/Incubator/lld/wiki/index.php?title=Draft_Vocabularies_Datasets_As_Current_Situation&diff=5884&oldid=5836

Cheers,

Antoine

1. Comment from Teague Allen on what is now the "available data" appendix
[
Instances of these categories are listed in the side-deliverable along with a brief description, links to their locations and to the [http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport use cases] that our group has gathered from the community. Two visualizations (@@TODO: maybe just one!) are also presented to help reveal the inter-relations of metadata element sets and the relationships between datasets and value vocabularies.
]
->
[
Specific (e.g., collection-level) datasets re-use elements from value vocabularies, and are structured according to the specifications of metadata element sets. For example, the British National Bibliography dataset re-uses concepts from the Library of Congress Headings vocabulary, and is structured by properties from the Dublin Core element set. Instances of these categories are listed in the side-deliverable along with a brief description, links to their locations and to the [http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport use cases] that our group has gathered from the community. A visualization is also presented to help reveal the relationships between datasets and value vocabularies.
]

2. Comment from Jodi (separate email) on the "linking issue" section. I know the fate of the "linking issue" is not decided yet, but still...
[
We also observe that links are being built between library-originated resources and resources originating in other organizations or domains, DBpedia being an obvious case.
]
->
[
We also observe that links are being built between library-originated resources and resources originating in other organizations or domains, DBpedia (a linked data version of [http://wikipedia.org Wikipedia]) being an obvious case.
]

3. Remaining TODO on more detail on alignment, answering comments from Jennifer Bowen and Ed Chamberlain, for the the "linking issue" section. Here I propose changes over two paragraphs:
[
At the level of datasets, one may observe the same phenomenon as for the previous categories. For example, Open Library has started attaching OCLC numbers to its manifestations. One may argue that description of individual books and other library-related resources is slightly less important than metadata element sets and value vocabularies, as far as re-use is concerned. Tools like union catalogues already realize a significant level of exchange of book-level data. Yet it is crucial -- and it is truly one of the expected benefits of linked data applied in our domain -- that library-related datasets get published and interconnected, rather than continuing to exist in their own silos. We note however that efforts are being undertaken, and that the community is already well aware of challenges such as the [http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion#Consider_migration_strategies "de-duplication"] one. We also observe that links are being built between
library-originated resources and resources originating in other organizations or domains, DBpedia (a linked data version of [http://wikipedia.org Wikipedia]) being an obvious case. Again, VIAF provides an example by taking the merged authority records and linking them to DBpedia whenever possible. This illustrates one of the expected benefits of linked data, where data can be easily networked, irrespective of its origins. The library domain can thus benefit from re-using data from other fields, while library data can itself contributes to initiatives that do not strictly fall into the library scope.

A crucial issue here is the availability of appropriate linking tools. A lot of efforts has been put in computer science research areas such as [http://ontologymatching.org Ontology Matching], leading to implementations based, e.g., on string matching and statistical techniques. However, many focus rather on metadata element sets, or are not ready as generalized applications that could be applied to the (often huge) datasets and value vocabularies from the library domain. Here, LLD efforts could benefit from the availability of recent generic tools for linking data such as [http://www4.wiwiss.fu-berlin.de/bizer/silk/ Silk - Link Discovery Framework], [http://code.google.com/p/google-refine/ Google Refine], or [http://code.google.com/p/google-refine/wiki/ReconciliationServiceApi Google Refine Reconciliation Service API]. Nonetheless, the community still needs to gain experience using them, sharing linking results, and possibly building more tools that are better suited to the

LLD environment.
]

4. Comments by Catherine Jones on not limiting the scope to bibliographic data and other "library-held resources", Jenifer Bowen and Ed Chamberlain on "This would be an appropriate place to mention the need for software tools that help libraries to convert their bibliographic datasets to linked data.", for the "Bibliographic datasets have received less attention" sub-section of "current situation". Now wording for one paragraph would be:
[
However, there are not yet many bibliographic datasets available in the linked data space, let alone data on other types of resource (metadata for journal articles, citation-level data, circulation information, etc.), which can be relevant in an environment where all this data can be (re-)used seamlessly across contexts. Examples such the release of the [http://www.slideshare.net/nw13/establishing-the-connection-creating-a-linked-data-version-of-the-bnb British National Bibliography] show that there is considerable work tackling challenges such as licensing, data modeling, handling legacy data and collaboration with (multiple) user communities. But they also point at considerable benefit involved in releasing bibliographic databases as Linked Data. As the community's experience increases, the number of datasets released as linked data keeps increasing at a fast pace.
]

Received on Thursday, 25 August 2011 13:20:43 UTC