Re: data integration, data sets, flamingo

On Fri, 7 Nov 2003, Butler, Mark wrote:

> Interestingly [flamingo has] some data sets, not directly related to
> SIMILE, but it might be interesting to make them available as RDF ...
> Specifically it would be interesting to investigate how hard it is to
> merge the two movie related databases.

I took a look at the Flamingo datasets.  The imdb "dataset" is just a flat
file of about 54,000 text names, one per line.  There appear to be
duplicate entries in the file, but this is of basically no use because the
duplicates aren't labelled.  How can you tell how well you are doing if
you don't have a gold standard to match against?

The other movie dataset is more interesting because it has a relational
structure, but I'm not really sure what is in it because it appears to be
a microsoft access database.

Anyway, thanks for the pointer to that project, and the other references
you sent a while back.  They are very helpful to me.

Nick Matsakis

Received on Tuesday, 18 November 2003 15:12:03 UTC