RE: imdb as linked open data? from Chris Sizemore on 2008-04-03 (public-lod@w3.org from April 2008)

From: Chris Sizemore <Chris.Sizemore@bbc.co.uk>
Date: Thu, 3 Apr 2008 12:30:04 +0100
To: "Richard Cyganiak" <richard@cyganiak.de>
Cc: <public-lod@w3.org>, "Silver Oliver" <Silver.Oliver@bbc.co.uk>, "Michael Smethurst" <Michael.Smethurst@bbc.co.uk>
Message-ID: <22E75701DF55CB459F5EC560C366846704C1594A@bbcxue219.national.core.bbc.co.uk>
'I wouldn't call it "using IMDB URIs as identifiers", but rather
"annotating IMDB pages with links into the Semantic Web". '

that really makes sense to me, and just the steer I was looking for!
it's always struck me as akward, this separation of Doc Web and Sem
Web... for instance, I find the difference between dbPedia URIs and
Wikipedia URLs for identifing things to be over-egging it...

but your approach gives me a way to reconcile this in my own mind, which
is where this has to start, I think...

much thanks!


best--

--cs

-----Original Message-----
From: Richard Cyganiak [mailto:richard@cyganiak.de] 
Sent: 03 April 2008 11:01
To: Chris Sizemore
Cc: public-lod@w3.org; Silver Oliver; Michael Smethurst
Subject: Re: imdb as linked open data?


On 2 Apr 2008, at 23:18, Chris Sizemore wrote:
> 1) the licensing seems too restrictive for the purposes of this 
> community, but has anyone taken the downloadable imdb data and tried 
> to RDF-ize it? thoughts?
>
> http://www.imdb.com/interfaces
> http://uk.imdb.com/help/show_leaf?usedatasoftware
>
> http://glinden.blogspot.com/2008/03/using-imdb-data-for-netflix-prize.
> html
>
> http://radar.oreilly.com/archives/2006/05/imdb-api.html
>

I did a partial RDFization of the IMDB data together with Bastian
Quilitz for a distributed querying demo (we meshed IMDB data with local
movie showtimes). Might dig out the code if someone needs it.

> 2) switching focus a bit, could we/should we be using imdb URIs as 
> identifiers for Movies, TV Programmes, and TV Programme Episodes, and 
> (certain) people? i think we should, so, from the best LOD practice 
> (given that imdb haven't yet pulled a dbpedia and provided 
> concept/data URIs in addition to their document URLs), shouldn't i
> use:
>
> http://www.imdb.com/title/tt0088846/#thing (to represent the gilliam 
> film Brazil in BBC RDF...)
>

 From a SemWeb POV this is pretty useless since the URI doesn't  
resolve to RDF data. Identifiers on the Web are only as good as the  
data they point to. IMDB URIs point to high-quality web pages, but not  
to data.

Also, don't squat other people's URI space. IMDB hasn't endorsed the  
#thing URIs, and simply creating new URIs inside someone else's URI  
space is considered a violation of the Web's social contract.

> 3) what if i published a site that publicly made available RDF such  
> as:
>
> http://www.imdb.com/name/nm0000187/#thing  owl:sameAs
http://musicbrainz.org/artist/79239441-bfd5-4981-a70c-55c3f15c1287.html#
thing
>
> or
>
> http://www.imdb.com/name/nm0000187/#thing  owl:sameAs
http://zitgist.org/79239441-bfd5-4981-a70c-55c3f15c1287 
>  (or whatever it is)
>

Better make your own identifiers. Implementation-wise it might make  
sense (or not) to re-use their internal IDs (0088846) in your own  
URIs, so you could have http://yourdomain/movies/0088846#thing .

It's still a good idea to include a link to the IMDB page in your  
data, e.g. using the foaf:page or foaf:isPrimaryTopicOf property,  
which can be used to link together things (e.g. movies, people) and  
web pages about them.

So you could have (in N3 syntax):

<http://yourdomain/people/0000187#thing>
     a foaf:Person;
     owl:sameAs <http://zitgist.org/79239441-bfd5-4981- 
a70c-55c3f15c1287>;
     owl:sameAs <http://dbpedia.org/resource/Madonna_%28entertainer%29>;
     foaf:isPrimaryTopicOf <http://www.imdb.com/name/nm0000187/>;
     .

> in other words, a set of RDF making equivalency statements about  
> people from imdb across to other datasets like musicbrainz?
>

The problem is that people in IMDB don't have URIs, and only IMDB is  
in a position to create them. IMDB only has URIs for web pages, so the  
best you can do is say something about the IMDB pages, e.g. what their  
topic is.

> would this community find that useful?
>
I think it would be useful.

> in other words, given the imdb licensing realities, are imdb URIs  
> useful as identifiers even if we can't use the related data?
>

IMDB URIs are useful because they resolve to high-quality human- 
readable web pages. This is valuable, because it's a very good way of  
making clear what our own LOD URIs identify.

The way I describe it above, I wouldn't call it "using IMDB URIs as  
identifiers", but rather "annotating IMDB pages with links into the  
Semantic Web".

> are URIs useful in LOD on their own?
>

As I said, a URI is only as good as what it resolves to. IMDB URIs are  
part of the old document Web, and only IMDB themselves can upgrade  
them to the Semantic Web, because they control what comes back when  
you request the URI.

Best,
Richard


>
>
>
> sorry for the ramble, but had a lot of imdb on my mind...
>
>
>
> all the best--
>
> --chris sizemore
>
>
>
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain  
> personal views which are not the views of the BBC unless  
> specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in  
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Thursday, 3 April 2008 11:30:40 UTC