Re: Several questions about Linked Data from Kingsley Idehen on 2011-03-23 (public-lod@w3.org from March 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 23 Mar 2011 11:40:10 -0400
To: wenlei zhou <wenlei.zhouwl@gmail.com>
CC: public-lod@w3.org
Message-ID: <4D8A145A.30108@openlinksw.com>
On 3/23/11 10:06 AM, wenlei zhou wrote:
> Thank you very much!
> Today, I have read the book which is publish in 
> http://linkeddatabook.com/editions/1.0/#note119. In section 4.5.2, I 
> find several tips.
> When link a dataset to other datasets, there are two different situation:
>
> To little size dataset, the publisher can use several different type 
> of predictors to connect to the entities which are declared in other 
> datasets.
> While to large size dataset, record linkage techniques can be used to 
> find the SameAs entities in other datasets automatically.
>
> Regards
>
>
>
>
> On 23 March 2011 21:50, Bill Roberts <bill@swirrl.com 
> <mailto:bill@swirrl.com>> wrote:
>
>     Although this has led to a discussion, no-one as far as I can tell
>     has actually tried to answer Wenlei Zhou's question.
>
>     Hope this helps:
>
>     I can create a link from my dataset to yours just by including a
>     triple in my dataset with one of 'your' URIs as object.
>
>     <http://mysite.com/id/1> <http://example.com/some-predicate>
>     <http://yoursite.com/id/ABC>
>
>     You can also declare information about links between datasets in
>     metadata about your dataset, using the voiD ontology - see
>     http://vocab.deri.ie/void
>
>     If you are thinking about the Linked Data Cloud diagram (see
>     http://richard.cyganiak.de/2007/10/lod/) then that is generated
>     every few months by Richard Cyganiak and Anja Jentzch, using some
>     criteria listed on that web page - the dataset must be at least
>     1000 triples and contain at least 50 links to other datasets
>     already in the diagram - and you have to make  sure Richard or
>     Anja knows about it.
>
>     Regards
>
>     Bill
>
>
>     On 22 Mar 2011, at 14:58, wenlei zhou wrote:
>
>     > Hi, every one
>     > I'm a novice to Linked Data. In the learning of linked data, I
>     got several questions.
>     >
>     > 1. When public a new data set, how does it connect to other data
>     sets in the Linked Data Cloud. Just use the record linkage
>     techniques to connect two identifiers which are actually telling
>     the same entity? Or, is this method only applied to publishing
>     large new data set, not small one?
>     > 2. Where is the rdf data stored, which connects two data sets?
>      Is it just stored in a third party?
>     >
>     > Can anyone tell me the answers?
>     > Thank you very much!!
>     >
>     > regards,
>     > Zhou Wenlei
>
>
Zhou,

Example.

Given an Entity: 'Nigeria' (type: Country). Described across a number of 
Linked Datasets. How would you obtain a holistic view of said Entity.

Links:

1. 
http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Faims.fao.org%2Faos%2Fgeopolitical.owl%23Nigeria&sas=yes 
-- description of 'Nigeria' with owl:sameAs inference context enabled

2. 
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Faims.fao.org%2Faos%2Fgeopolitical.owl%23Nigeria&tp=2&sas=yes 
-- leveraging host DBMS metadata re. determining source Named Graphs 
used in the description

3. 
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Faims.fao.org%2Faos%2Fgeopolitical.owl%23Nigeria&tp=3&sas=yes 
-- co-references list where clicking on each URI (while inference 
context is enabled) results in the same description (a union of all 
facts associated with the co-referent of all of the listed URIs)

4. 
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Faims.fao.org%2Faos%2Fgeopolitical.owl%23Nigeria&tp=3 
-- same as above without inference context, so clicking on each URI will 
present a description scoped to that URI .


Trouble is, you can only go so far with datasets, you need a space where 
the data is managed in such a way that it can deliver the solution you 
seek via queries. In the example above I use the LOD Cloud Cache 
instance which has 15 Billion+ triples. I could have also derived the 
same answers, with some overhead of course, if I had the SPARQL query 
execute a follow-your-nose style crawl across all the original source 
data spaces.

This is a DBMS style problem, ultimately. The DBMS has to maintain 
metadata about the data under its management. Then leverage said 
metadata when answering these queries.


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Wednesday, 23 March 2011 15:40:38 UTC