- From: Yves Raimond <yves.raimond@gmail.com>
- Date: Tue, 5 Aug 2008 12:17:41 +0100
- To: "Richard Cyganiak" <richard@cyganiak.de>
- Cc: "Kingsley Idehen" <kidehen@openlinksw.com>, public-lod@w3.org
Hello! > > Is this really a problem? Why not just keep in mind that triple numbers are > a purely mechanical measure and are no indication of quality or usefulness? > > A raw triple count is just that, a raw triple count. It doesn't mean > anything else. And it is useful for anyone who wants to > store/index/postprocess a dataset/linkset, because for storage and querying > the number of triples matters. > I wasn't talking about triples counts actually (as you said, a triple count is just that). But about quantifying the number of interlinks in a way that is consistent across dataset (just that, no notion of usefulness:-) ) - eg. in a way where you can say "oh, this dataset indeed have more interlinks than this one". I was arguing that the current way of doing it (counting triples that mention an `external' resource) is not consistent, as you can easily make that number higher or lower by applying simple transformations to your data. I think a consistent way of measuring the interlinking is to just count the number of distinct `external' resources in the dataset (which will give the lowest number you get by applying such transformations). See for example http://dbtune.org/musicbrainz/, http://dbtune.org/bbc/peel/ or http://dbtune.org/jamendo/ Cheers! y > I don't know of a good way to measure the quality or usefulness of a > dataset, and would like to simply claim that it cannot be easily expressed > in a number. > > Best, > Richard > > > > On 2 Aug 2008, at 16:23, Yves Raimond wrote: >>> >>> The same applies for geographic locations, for example. Some datasets >>> use foaf:based_near to link to Geonames, some others create their own >>> identifiers, and then link to the corresponding Geonames locations >>> through owl:sameAs. For the same dataset, this two methodologies will >>> lead to completely different numbers. >> >> Just a small toy example of that - if I consider the following dataset: >> >> @prefix : <http://my-dataset/>. >> @prefix geo : <http://geographic-dataset/>. >> :item1 foaf:based_near geo:location1. >> :item2 foaf:based_near geo:location1. >> >> 100% of the dataset correspond to links to another dataset. >> >> Now, if I consider >> >> :item1 foaf:based_near :location1. >> :item2 foaf:based_near :location1. >> :location1 owl:sameAs geo:location1. >> >> , which is equivalent to the previous dataset, this number drops to 33% >> >> Cheers! >> y >> >
Received on Tuesday, 5 August 2008 11:18:18 UTC