review of Vocabularies and Datasets Section

I took an action [1] to do a quick-ish review the wiki drafts of the
Vocabularies and Datasets ection bound for the final report, as well
as the separate Vocabulary and Dataset deliverable.

In general I think these two documents are really excellent, and are
ready to be circulated more widely for comments. Indeed, if you are
reading this and have some comments on the documents I think now would
be a good time. The comprehensive overview of the vocabularies that
draws on our case studies is very impressive--a lot of work must have
gone into it. And the way that you summarized with the Observations
section is very well done as well.

While I understand the distinction between Element Set, Value
Vocabulary and Dataset, I was a bit confused because both the Value
Vocabulary and Dataset examples use authors:

"""
VIAF defines authorities
"""

and:

"""
the same dataset may contain records for authors as first-class
entities that are linked from their book, described with elements like
"name" from FOAF
"""

Is it the case that something like VIAF is both a value vocabulary and
a dataset? Is it worth adding a sentence about how the categories are
not mutually exclusive? Or perhaps we should not talk about Datasets
at all? Also, did we decide not to ground our definition in terms of
TBOX and ABOX?

In the Linking section, does it make sense to mention VIAF as a good
example of a library project that creates links between library
resources? I think the cultural heritage sector needs to be encouraged
to share more information (in the form of articles, blog posts, etc)
about linking strategies, such as what OCLC have used to link VIAF
resources to Wikipedia, or Open Library's efforts to link to
worldcat.org. Also I think this section would be a good place to
highlight services such as Google Refines Reconciliation Service [4]
and the LOD2's Silk Framework. It would be good if the section
emphasized the need for our community to gain experience using them,
sharing linking results, and building more tools that are suited to
our environment.

I also have a few comments about the separate Vocabulary and Datasets
deliverable:

I see Crossref's DOI mentioned in the auto-generated graph, but should
we mention CrossRef's DOI service explicitly? [5]. It is a big
development for linked data for scholarly research.

Another recent development is that the Archipel project (mentioned in
the report) have published a PREMIS vocabulary [6] which is
significant for the digital preservation community. I don't know if
this will lead to something more formal from the PREMIS folks
themselves, but it is a good sign of things to come.

Should we include the LOCAH projects RDF vocabulary for archival
information [7]? I know that LOCAH are mentioned in the EAD section,
but Pete Johnston (one of the key folks behind DublinCore) & co have
spent a bit of time thinking about how to model archival data in RDF.
Also, Aaron Rubinestein has a lightweight vocabulary for expressing
Archival information which he calls Arch [8].

Really nice work!
//Ed

[1] http://www.w3.org/2005/Incubator/lld/minutes/2011/05/19-lld-minutes.html#action08
[2] http://www.w3.org/2005/Incubator/lld/wiki/Draft_Vocabularies_Datasets_Section
[3] http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset
[4] http://code.google.com/p/google-refine/wiki/ReconciliationServiceApi
[5] http://lod2.eu/Project/Silk.html
[5] http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html
[6] http://multimedialab.elis.ugent.be/ontologies/PREMIS2.0/v1.0/premis.owl
[7] http://data.archiveshub.ac.uk/def/
[8] http://purl.org/archival/vocab/arch

Received on Friday, 27 May 2011 18:42:02 UTC