- From: Hugh Glaser <hg@ecs.soton.ac.uk>
- Date: Wed, 6 Apr 2011 12:30:28 +0000
- To: Kingsley Idehen <kidehen@openlinksw.com>
- CC: "<nathan@webr3.org>" <nathan@webr3.org>, "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
On 4 Apr 2011, at 15:16, Kingsley Idehen wrote:
> On 4/4/11 10:06 AM, Nathan wrote:
>> Kingsley Idehen wrote:
>>> On 4/3/11 11:41 PM, Nathan wrote:
>>>> Hi Kinglsey, All,
>>>>
>>>> Incoming open request, could anybody provide similar statistics for the usage of each datatype in the wild (e.g. the xsd types, xmlliteral and rdf plain literal)?
>>>>
>>>> Ideally Kingsley, could you provide a breakdown from the lod cloud cache? would be very very useful to know.
>>>>
>>>> Best & TIA,
>>>>
>>>> Nathan
>>>>
>>>> Kingsley Idehen wrote:
>>>>> I've knocked up a Google spreadsheet that contains stats about our 21 Billion Triples+ LOD cloud cache.
>>>> ...
>>>>> https://spreadsheets.google.com/ccc?key=0AihbIyhlsQSxdHViMFdIYWZxWE85enNkRHJwZXV4cXc&hl=en -- LOD Cloud Cache SPARQL stats queries and results
>>>>
>>>
>>> Nathan,
>>>
>>> The typed literals used in> 10k triples:
>>>
>>> count datatype IRI
>>> 11308 xsd:anyURI
>>> 12553http://dbpedia.org/datatype/day
>>> 12788http://dbpedia.org/ontology/day
>>> 15875http://dbpedia.org/ontology/usDollar
>>> 18228http://dbpedia.org/datatype/usDollar
>>> 20828http://europeanaconnect.eu/voc/fondazione/sgti#fondazioneNot
>>> 22934http://statistics.data.gov.uk/def/administrative-geography/StandardCode
>>> 23368http://www.w3.org/2001/XMLSchema#date
>>> 30695http://dbpedia.org/datatype/inhabitantsPerSquareKilometre
>>> 31662http://dbpedia.org/datatype/second
>>> 35506http://dbpedia.org/datatype/kilometre
>>> 57409http://www.w3.org/2001/XMLSchema#int
>>> 160117http://stitch.cs.vu.nl/vocabularies/rameau/RecordNumber
>>> 632256http://www.w3.org/2001/XMLSchema#anyURI
>>> 1175435 xsd:string
>>> 1696035http://data.ordnancesurvey.co.uk/ontology/postcode/Postcode
>>> 70194534http://www.openlinksw.com/schemas/virtrdf#Geometry
>>> 120147725http://www.w3.org/2001/XMLSchema#string
>>>
>>> Spreadsheet will be updated too.
>>>
>>
>> Thanks Kingsley, very much appreciated! :)
>>
>> I have to admit I'm surprised by the lack of xsd:double and xsd:decimal in the two stats sets, and also the inclusion of some datatypes I'd never even heard of!
>>
>> Are there any virtuozo specific nuances which do some conversion, or are all of these as found in the serialized RDF?
>>
>> also is xsd:string automatically set for all plain literals (with / without langs?)
>>
>> Cheers,
>>
>> Nathan
>>
>>
>
> Data comes from internal table in Virtuoso. Note, a threshold has been set so what you are seeing is a picture relative to the total amount of data (21 Billion+ triples).
Hi Kingsley.
Thanks.
So these numbers are absolute numbers of some fraction of the dataset?
It would be good if that could be made clear - I certainly read your first message as being over the whole set, as I think did Dave and Nathan.
Perhaps it would be clearer to present as a percentage?
Also, if that is the case, is it a random sample, or might there be some artefacts in the system that skew towards some graphs or datasets?
Best
Hugh
>
>
> --
>
> Regards,
>
> Kingsley Idehen
> President& CEO
> OpenLink Software
> Web: http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
>
>
>
>
>
>
--
Hugh Glaser,
Intelligence, Agents, Multimedia
School of Electronics and Computer Science,
University of Southampton,
Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 78 9422 3822, Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/
Received on Wednesday, 6 April 2011 12:33:45 UTC