Re: imdb as linked open data? from Richard Cyganiak on 2008-04-04 (public-lod@w3.org from April 2008)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 4 Apr 2008 15:07:54 +0100
To: "Chris Sizemore" <Chris.Sizemore@bbc.co.uk>
Cc: <public-lod@w3.org>, "Michael Smethurst" <Michael.Smethurst@bbc.co.uk>, "Silver Oliver" <Silver.Oliver@bbc.co.uk>, <pepper@ontopia.net>
Message-Id: <27BE88D8-863C-42C2-AB03-A1A3B723E688@cyganiak.de>

On 4 Apr 2008, at 13:38, Chris Sizemore wrote:
> at this point in my indoctrination to LOD (i'm a long time semweb  
> fanboy, tho), i guess i disagree with: "From a SemWeb POV this [http://www.imdb.com/title/tt0088846/#thing 
> ] is pretty useless since the URI doesn't resolve to RDF data.  
> Identifiers on the Web are only as good as the data they point to.  
> IMDB URIs point to high-quality web pages, but not to data." --  
> clearly i understand the difference between "data" and "web page"  
> here, but i don't agree that it's so black and white. i'd suggest:  
> "Identifiers on the Web are only as good as the clarity of what they  
> point to..." i don't think there has to be RDF at the other end to  
> make a URI useful, in many cases...

The general point is that you need some sort of canonical lookup  
facility where the intended users of your identifiers can find out  
what a particular identifier refers to.

Well, if you want the URI to be useful to an RDF-aware agent, then you  
have to put RDF at the other end, HTML won't be very useful.

If you are fine with just being useful to HTML-aware agents (web  
browsers, Googlebot), then indeed just putting HTML is all you need.

Machines are not very good at finding out what a page of text refers  
to (modulo NLP, screen scraping etc).

> at this point, for example at the BBC, my view is that identifiers  
> and equivalency relationships are more important than RDF... just  
> barely more important, granted... having a common set of  
> identifiers, like navigable stars in the sky over an ocean, is what  
> we need most now, in order to help us aggregate content across the  
> org, and also link it up to useful stuff outside our walled garden.

I think this is true as long as you stay within a single organization.  
Within an org, having an agreed-upon set of terms is very useful and  
goes a long way. You usually have the organizational leverage to make  
people use these terms, and you can build a canonical database where  
people can look up the terms and tell everyone to use that database to  
learn about new identifiers.

Outside on the Web, getting agreement is much harder, and if you  
encounter a term that you don't know yet then it's not so clear where  
you look it up. Hence the stronger insistence on a few basic rules and  
conventions (one URI identifies exactly one thing; if it returns HTTP  
200, it's a document), and the insistence on using the Web itself as  
the lookup service where you find out what a URI means.

> but now, to stir things up a bit...
> given the above, thus:
>
> http://en.wikipedia.org/wiki/Madonna_(entertainer) owl:sameAs <http://www.imdb.com/name/nm0000187/ 
> >
>

I don't believe that the people who created http://en.wikipedia.org/wiki/Madonna_(entertainer) 
  would agree that it's the same thing as http://www.imdb.com/name/nm0000187/ 
.

Do you believe that any two web pages about Madonna are the same thing?

Do you believe that any two blog posts about Madonna are the same thing?

Do you believe that any two newspaper articles about Madonna are the  
same thing?

Doesn't make any sense to me.

Richard

>
> right? right?  ;-)

Received on Friday, 4 April 2008 14:08:43 UTC