RE: imdb as linked open data?

yep, i agree with you, peter...
 
"What is there to gain by referring to wikipedia:Madonna_(entertainer)
as opposed to dbpedia:Madonna_(entertainer)? "
 
i agree -- probably nothing to be gained in the wikipedia case, because
dbpedia DOES exist and wikipedia data is open by nature...
 
but in the case of IMDB, i think we might try my idea as proof of
concept, UNTIL such time as we can convince imdb to join us
"officially"?
 
in any case, the BBC film reviews RDF and HTML representations should be
linking to/pointing at imdb URLs.. perhaps i just need to choose the
right verbs...foaf:primaryTopicOf seems useful...
 
BTW, i don't hear too much about Dublin Core in these circles, is that
deemed non-LOD?
 
 
best--
 
--cs

________________________________

From: major.error@gmail.com [mailto:major.error@gmail.com] On Behalf Of
Peter Coetzee
Sent: 04 April 2008 15:20
To: Chris Sizemore
Cc: public-lod@w3.org; Michael Smethurst; Silver Oliver;
pepper@ontopia.net
Subject: Re: imdb as linked open data?


Hi Chris,


On Fri, Apr 4, 2008 at 1:38 PM, Chris Sizemore
<Chris.Sizemore@bbc.co.uk> wrote:


	all--
	 
	so, i was correct in thinking that imdb is interesting to the
LOD community.


Definitely :)



	 
	i agree that offering "what's a/the Sem Web business model?" is
pretty important in order to get buy in... does anyone have any contacts
in and around imdb?
	 
	 
	 
	***************** forgive the following if it's controversial --
i'm honestly just trying to understand better ***********
	 
	however, on a more philosophical note, i DON'T think imdb
neccesarily needs to explicitly opt into the Web of Data in order for
the world at large to find Sem Web value in that data... i suppose it
would be very desirable for imdb to officially provide Open Data/rdf of
their content, but i don't think that's the only way for the Sem Web to
gain value from imdb...


This is a great concept, and one which (if it's resolvable) could make
adoption of linked data a path of much less resistance than it currently
finds...like the original web, get the data and use cases out there
first and the business models will quickly catch up!



	 
	basically, my premise is this: imdb is on the Web of Docs, and
that's good enough for the purpose of answering the question to be posed
here -- http://www.okkam.org/IRSW2008/ (the problem of identity and
reference on the Semantic Web is perhaps the single most important issue
for reaching a global scale. Initiatives like LinkedData, OntoWorld and
the large number of proposals aiming at using popular URLs (e.g.
Wikipedia's) as "canonical" URIs (especially for non informational
resources) show that a solution to this issue is very urgent and very
relevant.)
	 
	at this point in my indoctrination to LOD (i'm a long time
semweb fanboy, tho), i guess i disagree with: "From a SemWeb POV this
[http://www.imdb.com/title/tt0088846/#thing
<http://www.imdb.com/title/tt0088846/#thing> ] is pretty useless since
the URI doesn't resolve to RDF data. Identifiers on the Web are only as
good as the data they point to. IMDB URIs point to high-quality web
pages, but not to data." -- clearly i understand the difference between
"data" and "web page" here, but i don't agree that it's so black and
white. i'd suggest: "Identifiers on the Web are only as good as the
clarity of what they point to..." i don't think there has to be RDF at
the other end to make a URI useful, in many cases...


Agreed (within a few constraints) ;) 


	
	 
	at this point, for example at the BBC, my view is that
identifiers and equivalency relationships are more important than RDF...
just barely more important, granted... having a common set of
identifiers, like navigable stars in the sky over an ocean, is what we
need most now, in order to help us aggregate content across the org, and
also link it up to useful stuff outside our walled garden.
	 
	so, i'm one of those who feel that websites like imdb,
wikipedia, and musicbrainz provide great identifiers for non-information
resources even in their Web of Docs form. i know that most of you here
will feel that this is lazy, too informal, and naive of me. but my
argument is that, for sites like those i mention (not all websites, by
any means) we may as well, for the purposes of our day to day use cases,
use their URLs as if they were Sem Web URIs. on these sites, the
distinction between resource and representation (concept and doc about
concept) is not what's pertinent.
	 
	i'm aware that most on this list will make a religious
distinction between:
	 
	http://dbpedia.org/resource/Madonna_%28entertainer%29
	 
	and 
	 
	http://en.wikipedia.org/wiki/Madonna_(entertainer
<http://en.wikipedia.org/wiki/Madonna_%28entertainer> )
	 
	but i think that, by convention, and in the contexts they'd
actually be used, we should treat them both as identifiers for the same
concept, and that they are essentially sameAs's *in common practice"...


By this logic, is <http://en.wikipedia.org/wiki/Madonna_%28entertainer>
http://en.wikipedia.org/wiki/Madonna_(entertainer) sameAs
http://www.madonna.com/home/ sameAs http://www.myspace.com/madonna? They
certainly don't give the same page, and only subsets of the data given
by each page will be the same. They refer to the same person, true -
surely then it's more useful to be able to make the isPrimaryTopicOf
statement you suggest below; using
http://en.wikipedia.org/wiki/Madonna_(entertainer) as a URI to represent
"Madonna-the-non-information-resource" (or, to a human,
"Madonna-the-person"!) precludes anyone from making statements about
"Madonna-the-Wikipedia-entry" (e.g. who wrote it, when it was last
updated, etc).



	
	 
	in other words, as much as i love dbPedia and think it's a
brilliant step forward, i personally was fine with WIkipedia URLs as
identifiers. the incredible thing about dbpedia is the data mining to
extract RDF, not the URIs or content negotiation.
	 
	i KNOW that, technically, what i'm saying breaks all our rules
-- and i followed
http://www.w3.org/2001/tag/doc/httpRange-14/2007-05-31/HttpRange-14.html
closely -- but philosophically i think there's something to what i'm
saying... if the Web is easy and the Sem Web hard, must we insist on
perfection? must we insist that imdb agree with us and explicitly opt
in?
	 
	practically, tho, in an "official" LOD grammar sense, this works
just fine for me: 

	<http://dbpedia.org/resource/Madonna_%28entertainer%29
<http://dbpedia.org/resource/Madonna_%28entertainer%29> >
foaf:isPrimaryTopicOf <http://www.imdb.com/name/nm0000187/
<http://www.imdb.com/name/nm0000187/> >

	<http://dbpedia.org/resource/Madonna_%28entertainer%29
<http://dbpedia.org/resource/Madonna_%28entertainer%29> >
foaf:isPrimaryTopicOf http://en.wikipedia.org/wiki/Madonna_(entertainer
<http://en.wikipedia.org/wiki/Madonna_%28entertainer> )

	that seems useful and easy. to me, that's allowing a
"sameAs"-like relationship between Web of Docs URLs and SemWeb URIs... i
could really really run with that approach...

	 

	but now, to stir things up a bit...

	given the above, thus:

	http://en.wikipedia.org/wiki/Madonna_(entertainer
<http://en.wikipedia.org/wiki/Madonna_%28entertainer> ) owl:sameAs
<http://www.imdb.com/name/nm0000187/
<http://www.imdb.com/name/nm0000187/> >

I'm probably taking the bait far too easily here :) I like what you're
suggesting in principle - anything which simplifies things has *got* to
be a good thing for us all...but I'm not sure this would add much value.
If you *do* make statements like the above, a semweb client won't have a
clue where to go to look for data about Madonna-the-person; you loosen
the semantics (and, indeed, usefulness) of sameAs somewhat. Where's the
added value in doing the above? What is there to gain by referring to
wikipedia:Madonna_(entertainer) as opposed to
dbpedia:Madonna_(entertainer)? 


	

	

	 
	right? right?  ;-)
	 
	 
	 
	best--
	 
	--cs


Cheers,
Peter

 


	
	 

________________________________

	From: public-lod-request@w3.org
[mailto:public-lod-request@w3.org] On Behalf Of Sergey Chernyshev
	Sent: 03 April 2008 17:47
	To: public-lod@w3.org 

	Subject: Re: imdb as linked open data?
	

	Yes, it's exactly the thing I was thinking about - what is the
business model (or at least approach that can bring money) for content
providers to
	

	1.	create data 
	2.	release it under open (or not so open) license so other
parties can freely use it
		
	3.	and spend money on RDFizing it

	I think, until this is resolved, Semantic Web is not going to
blossom and go far beyond open data.
	
	Publishers are fighting for attention because current business
model is based on advertising (other models like micropayments, payment
propagation from ISPs to content providers and so on didn't work out).
That's why they are happy to give money and optimize their content to
Google standards for SEO purposes, but what will make them RDFize their
data?
	
	But in reality it's not all that bad - RSS showed that people
are interested in opening their content and adding structure to it if
users come back to their site to enjoy full experience. It's just a
question of what level of open data will those big (or not so big)
publishers open to public and at which point will users need to go back
to their site to see the ads. Or maybe see the ads withing the consuming
application?
	
	In any case, I think it's a big question worth discussing,
unfortunately I didn't see any business-related sessions on LinkedData
Planet.
	
	          Sergey
	
	
	
	On Thu, Apr 3, 2008 at 10:48 AM, Hugh Glaser
<hg@ecs.soton.ac.uk> wrote:
	


		On 03/04/2008 12:41, "Kingsley Idehen"
<kidehen@openlinksw.com> wrote:
		
		> Hugh Glaser wrote:
		...
		
		>> Hugh
		>>
		>>
		>>
		>>
		> Hugh,
		>
		> This is an example of many to come, where LOD needs to
pitch the value
		> of Linked Data to Information Publishers :-) I think
they will
		> ultimately publish and host their own RDF Linked Data
once the intrinsic
		> value is clear to them.
		
		And when there is also actual extrinsic value? :-)
		But yes, and making it easy for them, possibly by
actually doing it for
		them, is part of the bootstrap process.
		The thing I am trying to work out is exactly how to make
the pitch that fits
		with their business model, and where their profit line
might come from.
		This requires a serious understanding of the detailed
business model for the
		company in question (which is not necessarily a skill
the an academic SW
		researcher has!).
		
		We also have similar LOD installations for CORDIS (the
EU funding agencies'
		DB), NSF (a US funding agency), EPSRC (a UK funding
agency), and ACM, among
		others. We have now engineered them so that they can be
moved to the
		Information Publisher if desired. Such organisations
sometimes have it as
		part of their remit to publicise the results, so they
should be easier to
		deal with, in theory.
		If anyone has a ready conduit to the appropriate place
in such
		organisations, we would be delighted to talk with them,
showing them what
		might be done.
		>
		> --
		
		>
		>
		> Regards,
		>
		> Kingsley Idehen       Weblog:
http://www.openlinksw.com/blog/~kidehen
<http://www.openlinksw.com/blog/%7Ekidehen> 
		> President & CEO
		> OpenLink Software     Web: http://www.openlinksw.com
		>
		>
		>
		>
		>
		
		
		




	-- 
	Sergey Chernyshev
	http://www.sergeychernyshev.com/ 
	
	http://www.bbc.co.uk
	This e-mail (and any attachments) is confidential and may
contain personal views which are not the views of the BBC unless
specifically stated.
	If you have received it in error, please delete it from your
system.
	Do not use, copy or disclose the information in any way nor act
in reliance on it and notify the sender immediately.
	Please note that the BBC monitors e-mails sent or received.
	Further communication will signify your consent to this. 



http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					

Received on Friday, 4 April 2008 15:12:42 UTC