Re: LOD Cloud Cache Stats

On 4 Apr 2011, at 15:16, Kingsley Idehen wrote:

> On 4/4/11 10:06 AM, Nathan wrote:
>> Kingsley Idehen wrote:
>>> On 4/3/11 11:41 PM, Nathan wrote:
>>>> Hi Kinglsey, All,
>>>> 
>>>> Incoming open request, could anybody provide similar statistics for the usage of each datatype in the wild (e.g. the xsd types, xmlliteral and rdf plain literal)?
>>>> 
>>>> Ideally Kingsley, could you provide a breakdown from the lod cloud cache? would be very very useful to know.
>>>> 
>>>> Best & TIA,
>>>> 
>>>> Nathan
>>>> 
>>>> Kingsley Idehen wrote:
>>>>> I've knocked up a Google spreadsheet that contains stats about our 21 Billion Triples+ LOD cloud cache.
>>>> ...
>>>>> https://spreadsheets.google.com/ccc?key=0AihbIyhlsQSxdHViMFdIYWZxWE85enNkRHJwZXV4cXc&hl=en -- LOD Cloud Cache SPARQL stats queries and results
>>>> 
>>> 
>>> Nathan,
>>> 
>>> The typed literals used in>  10k triples:
>>> 
>>> count    datatype IRI
>>> 11308    xsd:anyURI
>>> 12553http://dbpedia.org/datatype/day
>>> 12788http://dbpedia.org/ontology/day
>>> 15875http://dbpedia.org/ontology/usDollar
>>> 18228http://dbpedia.org/datatype/usDollar
>>> 20828http://europeanaconnect.eu/voc/fondazione/sgti#fondazioneNot
>>> 22934http://statistics.data.gov.uk/def/administrative-geography/StandardCode 
>>> 23368http://www.w3.org/2001/XMLSchema#date
>>> 30695http://dbpedia.org/datatype/inhabitantsPerSquareKilometre
>>> 31662http://dbpedia.org/datatype/second
>>> 35506http://dbpedia.org/datatype/kilometre
>>> 57409http://www.w3.org/2001/XMLSchema#int
>>> 160117http://stitch.cs.vu.nl/vocabularies/rameau/RecordNumber
>>> 632256http://www.w3.org/2001/XMLSchema#anyURI
>>> 1175435  xsd:string
>>> 1696035http://data.ordnancesurvey.co.uk/ontology/postcode/Postcode
>>> 70194534http://www.openlinksw.com/schemas/virtrdf#Geometry
>>> 120147725http://www.w3.org/2001/XMLSchema#string
>>> 
>>> Spreadsheet will be updated too.
>>> 
>> 
>> Thanks Kingsley, very much appreciated! :)
>> 
>> I have to admit I'm surprised by the lack of xsd:double and xsd:decimal in the two stats sets, and also the inclusion of some datatypes I'd never even heard of!
>> 
>> Are there any virtuozo specific nuances which do some conversion, or are all of these as found in the serialized RDF?
>> 
>> also is xsd:string automatically set for all plain literals (with / without langs?)
>> 
>> Cheers,
>> 
>> Nathan
>> 
>> 
> 
> Data comes from internal table in Virtuoso. Note, a threshold has been set so what you are seeing is a picture relative to the total amount of data (21 Billion+ triples).
Hi Kingsley.
Thanks.
So these numbers are absolute numbers of some fraction of the dataset?
It would be good if that could be made clear - I certainly read your first message as being over the whole set, as I think did Dave and Nathan.
Perhaps it would be clearer to present as a percentage?
Also, if that is the case, is it a random sample, or might there be some artefacts in the system that skew towards some graphs or datasets?
Best
Hugh
> 
> 
> -- 
> 
> Regards,
> 
> Kingsley Idehen	
> President&  CEO
> OpenLink Software
> Web: http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
> 
> 
> 
> 
> 
> 

-- 
Hugh Glaser,  
              Intelligence, Agents, Multimedia
              School of Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 78 9422 3822, Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/

Received on Wednesday, 6 April 2011 12:31:05 UTC