W3C home > Mailing lists > Public > public-lod@w3.org > April 2011

Re: How many instances of foaf:Person are there in the LOD Cloud?

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 14 Apr 2011 07:30:53 -0400
Message-ID: <4DA6DAED.2040801@openlinksw.com>
To: Marco Fossati <fossati@fbk.eu>
CC: Bernard Vatant <bernard.vatant@mondeca.com>, Linking Open Data <public-lod@w3.org>
On 4/14/11 6:10 AM, Marco Fossati wrote:
> Kingsley,
>
> As I want to extract literals containing full names of persons from 
> the LOD Cloud cache, I tried to count them with the following query:
>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> SELECT COUNT (DISTINCT ?pers) AS ?person COUNT (DISTINCT ?name) AS 
> ?foaf COUNT (DISTINCT ?label) AS ?rdfs WHERE { ?pers a foaf:Person ; 
> foaf:name ?name ; rdfs:label ?label .}
>
> Strangely, the count of foaf:Person instances is different from the 
> one you mentioned: 2,828,451 (see http://bit.ly/eBzdjh).
> Do you know why? Maybe timeout reasons?

No timeouts for these kinds of queries :-)

Use:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT COUNT (DISTINCT ?pers) AS ?person COUNT (DISTINCT ?name) AS ?foaf 
COUNT (DISTINCT ?label) AS ?rdfs
WHERE { ?pers a foaf:Person. optional {?pers foaf:name ?name}. optional 
{?pers rdfs:label ?label} }

Remember: the data has issues re. consistent use of properties and the 
existence of property values. This (orthogonally) is an issue that 
ultimately sheds some light on what we've had to do re. automatically 
replacing URIs with Labels in our various linked data browser pages. 
Basically, optionals + reasoning at massive scales i.e., users don't 
experience the complexity associated with fulfilling Linked Data UI 
aesthetics which boils down to taking URIs out of sight without losing 
their data conduction might.


Kingsley
>
> Cheers,
>
> Marco
> FBK Web of Data unit
> http://fbk.eu
> http://wed.fbk.eu/
>
>
> On 4/13/11 2:37 PM, Kingsley Idehen wrote:
>> On 4/13/11 4:15 AM, Bernard Vatant wrote:
>>> Hello all
>>>
>>> Just trying to figure what is the size of personal information 
>>> available as LOD vs billions of person profiles stored by Google, 
>>> Amazon, Facebook, LinkedIn, unameit ... in proprietary formats.
>>>
>>> Any hint of the proportion of "living" people vs historical 
>>> characters is also welcome.
>>>
>>> Any idea?
>>>
>>> Bernard
>>>
>>>
>>> -- 
>>> Bernard Vatant
>>> Senior Consultant
>>> Vocabulary & Data Integration
>>> Tel:       +33 (0) 971 488 459
>>> Mail: bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com>
>>> ----------------------------------------------------
>>> Mondeca
>>> 3, cité Nollez 75018 Paris France
>>> Web: http://www.mondeca.com
>>> Blog: http://mondeca.wordpress.com
>>> ----------------------------------------------------
>> Bernard,
>>
>> LOD Cloud cache has 3,321,094 foaf:Person entities [1]. Distinct 
>> count 3,319,862 count [2].
>> URIBurner has 4,564,981 foaf:Person entities [3]. Distinct count is 
>> 4,555,697 [4] .
>>
>> Both cases via SPARQL aggregate queries against their respective 
>> endpoints. Note, no inference context applied there are a variety of 
>> rules across OpenCyc, UMBEL, Yago, and DBpedia that would alter these 
>> counts.
>>
>> Tip re. URLs below, simply change the "authority" part of the URL 
>> when seeking similar counts from other Virtuoso instances, with some 
>> luck it could apply to other SPARQL endpoints in general, subject to 
>> what the endpoints support and permit etc..
>>
>> SPARQL queries used across each endpoint:
>>
>> select count(?s) where  {?s a foaf:Person}
>>
>> select count(distinct ?s) where  {?s a foaf:Person}
>>
>> Links:
>>
>> 1. http://lod.openlinksw.com/c/CYIZZL4 -- LOD Cloud Cache
>> 2. http://lod.openlinksw.com/c/COXER7C -- LOD Cloud Cache Distinct Count
>> 3. http://uriburner.com/c/DYVU7N -- URIBurner
>> 4. http://uriburner.com/c/DV6VPQ -- URIBurner Distinct Count .
>>
>>
>> -- 
>>
>> Regards,
>>
>> Kingsley Idehen	
>> President&  CEO
>> OpenLink Software
>> Web:http://www.openlinksw.com
>> Weblog:http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca: kidehen
>>
>>
>>
>>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Thursday, 14 April 2011 11:31:19 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:32 UTC