Re: Visualizing LOD Linkage from Yves Raimond on 2008-08-05 (public-lod@w3.org from August 2008)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Tue, 5 Aug 2008 12:17:41 +0100
To: "Richard Cyganiak" <richard@cyganiak.de>
Cc: "Kingsley Idehen" <kidehen@openlinksw.com>, public-lod@w3.org
Message-ID: <82593ac00808050417s39a5cf48s8beaf11867e395a2@mail.gmail.com>

Hello!

>
> Is this really a problem? Why not just keep in mind that triple numbers are
> a purely mechanical measure and are no indication of quality or usefulness?
>
> A raw triple count is just that, a raw triple count. It doesn't mean
> anything else. And it is useful for anyone who wants to
> store/index/postprocess a dataset/linkset, because for storage and querying
> the number of triples matters.
>

I wasn't talking about triples counts actually (as you said, a triple
count is just that). But about quantifying the number of interlinks in
a way that is consistent across dataset (just that, no notion of
usefulness:-) ) - eg. in a way where you can say "oh, this dataset
indeed have more interlinks than this one". I was arguing that the
current way of doing it (counting triples that mention an `external'
resource) is not consistent, as you can easily make that number higher
or lower by applying simple transformations to your data.

I think a consistent way of measuring the interlinking is to just
count the number of distinct `external' resources in the dataset
(which will give the lowest number you get by applying such
transformations).

See for example http://dbtune.org/musicbrainz/,
http://dbtune.org/bbc/peel/ or http://dbtune.org/jamendo/

Cheers!
y

> I don't know of a good way to measure the quality or usefulness of a
> dataset, and would like to simply claim that it cannot be easily expressed
> in a number.
>
> Best,
> Richard
>
>
>
> On 2 Aug 2008, at 16:23, Yves Raimond wrote:
>>>
>>> The same applies for geographic locations, for example. Some datasets
>>> use foaf:based_near to link to Geonames, some others create their own
>>> identifiers, and then link to the corresponding Geonames locations
>>> through owl:sameAs. For the same dataset, this two methodologies will
>>> lead to completely different numbers.
>>
>> Just a small toy example of that - if I consider the following dataset:
>>
>> @prefix : <http://my-dataset/>.
>> @prefix geo : <http://geographic-dataset/>.
>> :item1 foaf:based_near geo:location1.
>> :item2 foaf:based_near geo:location1.
>>
>> 100% of the dataset correspond to links to another dataset.
>>
>> Now, if I consider
>>
>> :item1 foaf:based_near :location1.
>> :item2 foaf:based_near :location1.
>> :location1 owl:sameAs geo:location1.
>>
>> , which is equivalent to the previous dataset, this number drops to 33%
>>
>> Cheers!
>> y
>>
>

Received on Tuesday, 5 August 2008 11:18:18 UTC