Re: Identifiers for digital content from Graham Klyne on 2012-07-12 (semantic-web@w3.org from July 2012)

From: Graham Klyne <GK-lists@ninebynine.org>
Date: Thu, 12 Jul 2012 18:21:57 +0100
To: Mo McRoberts <mo.mcroberts@bbc.co.uk>
CC: semantic-web@w3.org
Message-ID: <4FFF07B5.1010305@ninebynine.org>
Hi Mo,

Something a project I'm involved with is working on has similar goals ... for 
research data and scientific workflows ... but the techniques are very general. 
  Roughly: an RDF manifest using ORE and AO vocabularies.

   http://wf4ever.github.com/ro/

There's a very simple example here:

   https://github.com/wf4ever/ro-catalogue/tree/master/v0.1/trivial

And we're also working on a web API for accessing them which should be pretty 
much independent of the detailed organization of the parts:

   http://www.wf4ever-project.org/wiki/display/docs/RO+SRS+interface+6

The whole assemblage can be shipped around as a ZIP file.

So: I think all this is a realization of what you are suggesting.  And BTW, we 
are working on including provenance information.  The system is ripe for missing 
in (say) licensing information using your favourite RDF vocabulary.

There's Python and Java code for much of this (unfortunately, the Python code 
isn't yet open, because it's entangled with someone's as-yet unpublished PhD 
work but we are hoping it can be opened up real soon now).

BTW, DOIs might be an option for the identifiers, as they can be published as 
http://dx.doi.org/... for dereferencability.  There's a growing dereferencing 
structure not unlike PURL, but somewhat more formalized if people feel that's 
what is required.

#g
--

On 12/07/2012 16:12, Mo McRoberts wrote:
> Hello!
>
> I've been recently looking into identifiers and metadata for digital content — in the first instance, TV programmes, but also radio shows, films, stills, etc., etc.
>
> As the broadcasting world slowly inches towards a tapeless workflow where things are exchanged as files which can be sent over the Internet (or similar), we're steadily moving away from a world where labels are stuck to tapes containing the relevant identifying information and instead towards one where identifiers can be embedded within the media item itself (or into a metadata 'sidecar').
>
> As these files are… well, files, duplicating them and archiving them is a process of very different shape as compared to with tapes. At this point, having an identifier which is only meaningful within the context of a particular organisation (or worse: within a particular production!) becomes a serious headache.
>
> Ultimately, the point of having these identifiers is to answer two questions:
>
> * What is it?
>
> * What can I do with it [and consequently, how much will it cost?]
>
> And so the thought occurs to me that one way to do this would be through the use of linked data: the identifiers which get embedded and passed around with the media are expressed as URIs, and in particular URIs which can be dereferenced in order to obtain RDF which expresses descriptive metadata, including provenance and licensing information.
>
> I know there are standard schemes for some kinds of media (e.g., ISBNs for books), and vying-to-be-standard schemes for other kinds (e.g., EIDR for movies and TV) — but it strikes me that linked data isn't incompatible with any of these; and that if you're going to specify means of identifying various kinds of content, URIs provide a means to take a uniform approach with everything — and the distinction comes in how you embed that identifier and the nature of the metadata published about the item.
>
> I'd be very interested in people's thoughts on this...
>
> M.
>
Received on Thursday, 12 July 2012 17:25:08 UTC