RE: imdb as linked open data? from Chris Sizemore on 2008-04-04 (public-lod@w3.org from April 2008)

From: Chris Sizemore <Chris.Sizemore@bbc.co.uk>
Date: Fri, 4 Apr 2008 16:07:00 +0100
To: "Richard Cyganiak" <richard@cyganiak.de>
Cc: <public-lod@w3.org>, "Michael Smethurst" <Michael.Smethurst@bbc.co.uk>, "Silver Oliver" <Silver.Oliver@bbc.co.uk>, <pepper@ontopia.net>
Message-ID: <22E75701DF55CB459F5EC560C366846704C15971@bbcxue219.national.core.bbc.co.uk>
very useful richard... much thanks

"Do you believe that any two web pages about Madonna are the same
thing?"  -- it's not about what things *are*, it's about how we choose
to use things in the practice of communication? imdb URLs are an
opportunity to identify people and movies on a large scale, and it's
just sitting there on the (Doc) Web...

I don't care much about "webpages" actually (what are those? ;-0  )... I
don't believe that "webpages" and their URLs are the same thing... (you
can probably tell I'm no big fan of the distinction between "information
resource" and "resource"... no diff to me)

what I believe is that the following 2 URLs can be used, in practice, in
certain contexts (perhaps NOT LOD?), to identify an equivalent
concept... 

http://en.wikipedia.org/wiki/Madonna_(entertainer)
http://www.imdb.com/name/nm0000187/
 

when you say "I think this is true as long as you stay within a single
organization." -- you are on to something there, though at the BBC we
are beginning to use reference data lists/controlled vocabularies from
*outside* the BBC to annotate content inside... though you might prefer
we be using Dbpedia URIs, it's easier for our editors to tag our content
using Wikipedia URLs, because they can confirm and choose using the
Wikipedia website itself -- and if needed we can convert to dbpedia when
we publish LOD externally...

"Do you believe that any two newspaper articles about Madonna are the
same thing?" -- actually, many times they are, because or Reuters and AP
syndication... 

again, I'm expressly not trying to wind people up, just expressing my
personal take on all this, and trying to learn from what you all are
saying... much obliged for the useful debate...


best--

--cs



-----Original Message-----
From: Richard Cyganiak [mailto:richard@cyganiak.de] 
Sent: 04 April 2008 15:08
To: Chris Sizemore
Cc: public-lod@w3.org; Michael Smethurst; Silver Oliver;
pepper@ontopia.net
Subject: Re: imdb as linked open data?


On 4 Apr 2008, at 13:38, Chris Sizemore wrote:
> at this point in my indoctrination to LOD (i'm a long time semweb 
> fanboy, tho), i guess i disagree with: "From a SemWeb POV this 
> [http://www.imdb.com/title/tt0088846/#thing
> ] is pretty useless since the URI doesn't resolve to RDF data.  
> Identifiers on the Web are only as good as the data they point to.  
> IMDB URIs point to high-quality web pages, but not to data." -- 
> clearly i understand the difference between "data" and "web page"
> here, but i don't agree that it's so black and white. i'd suggest:  
> "Identifiers on the Web are only as good as the clarity of what they 
> point to..." i don't think there has to be RDF at the other end to 
> make a URI useful, in many cases...

The general point is that you need some sort of canonical lookup
facility where the intended users of your identifiers can find out what
a particular identifier refers to.

Well, if you want the URI to be useful to an RDF-aware agent, then you
have to put RDF at the other end, HTML won't be very useful.

If you are fine with just being useful to HTML-aware agents (web
browsers, Googlebot), then indeed just putting HTML is all you need.

Machines are not very good at finding out what a page of text refers to
(modulo NLP, screen scraping etc).

> at this point, for example at the BBC, my view is that identifiers and

> equivalency relationships are more important than RDF... just barely 
> more important, granted... having a common set of identifiers, like 
> navigable stars in the sky over an ocean, is what we need most now, in

> order to help us aggregate content across the org, and also link it up

> to useful stuff outside our walled garden.

I think this is true as long as you stay within a single organization.  
Within an org, having an agreed-upon set of terms is very useful and
goes a long way. You usually have the organizational leverage to make
people use these terms, and you can build a canonical database where
people can look up the terms and tell everyone to use that database to
learn about new identifiers.

Outside on the Web, getting agreement is much harder, and if you
encounter a term that you don't know yet then it's not so clear where
you look it up. Hence the stronger insistence on a few basic rules and
conventions (one URI identifies exactly one thing; if it returns HTTP
200, it's a document), and the insistence on using the Web itself as the
lookup service where you find out what a URI means.

> but now, to stir things up a bit...
> given the above, thus:
>
> http://en.wikipedia.org/wiki/Madonna_(entertainer) owl:sameAs 
> <http://www.imdb.com/name/nm0000187/
> >
>

I don't believe that the people who created
http://en.wikipedia.org/wiki/Madonna_(entertainer)
  would agree that it's the same thing as
http://www.imdb.com/name/nm0000187/
.

Do you believe that any two web pages about Madonna are the same thing?

Do you believe that any two blog posts about Madonna are the same thing?

Do you believe that any two newspaper articles about Madonna are the
same thing?

Doesn't make any sense to me.

Richard



>
> right? right?  ;-)


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Friday, 4 April 2008 15:07:43 UTC