- From: Peter Ansell <ansell.peter@gmail.com>
- Date: Wed, 6 Aug 2008 19:51:40 +1000 (EST)
- To: Yves Raimond <yves.raimond@gmail.com>
- Cc: Kingsley Idehen <kidehen@openlinksw.com>, public-lod@w3.org, Richard Cyganiak <richard@cyganiak.de>
----- "Yves Raimond" <yves.raimond@gmail.com> wrote: > Hello! > > >> Hello! > >> > >> > > >> > It depends on whether you know that the external references are > >> distinct just based on the URI string. If someone links out to > >> multiple formats using external resource links then they would have > to > >> be counted as multiple links as you have no way of knowing that > they > >> are different, except in the case where you reserve these types of > >> links as RDF literals. > >> > > >> > >> I am not sure if I interpret it correctly - do you mean that you > >> could > >> link to two URIs which are in fact sameAs in the target dataset? > >> Indeed, in that case, the measure would be slightly higher than > what > >> it should be. However, I would think that it is rarely (if not > never) > >> the case. > > > > I personally don't put sameAs on URI's which relate to the same > thing but are really just different representations, ie, the HTML > version doesn't get sameAs the RDF version. > > Oh god, I wasn't suggesting that *at all*!!! Replace "outbound links" > to "outbound links to *things* in other LOD datasets" in what I said > before, if that makes it clearer. Its all good! I understand what you mean now. :) I am not sure is my only answer to that one. > > > >> > I think a more effective way is to measure both the number of > >> outbound and inbound links, something which is only possible if > you > >> have a way of determining in the rest of the web who links back to > a > >> particular resource in your own scheme. If you count inbound > links, > >> like Google do with PageRank, then you get a better idea of who is > >> actually being used as opposed to who is just linking into all the > >> others it can find. See [1] for a description of it in the Bio2RDF > >> project. > >> > > >> > >> I think this measure of inbound links is more related to a > >> "usefulness" metric, as highlighted by Richard. Again, I do not > >> attempt to propose such a measure. > >> I just want the raw number of "outbound links to other datasets" > to > >> be > >> *comparable* amongst datasets. > >> > > > > I am not completely sure what the value in just measuring outbound > links is myself. As you say it is easy to manipulate, and depends > heavily on the particular domain that the database is being used from. > To get the equaliser I think you need to include inbound links > somehow, or possibly scale the number of outbound links against the > size of the database to get an equaliser that way. > > > > Yes, I would personally go for the latter. But still, you need > something to normalise that is consistently measured across datasets. Maybe you could rank based on the number of other databases that are linked to per record... I am only trying to come up with examples based on the way that I have seen it functioning in Bio2RDF here, which is a slightly different case to the general case. > > Either way, you have to interactively discover these things, just > talking about them could lead to us missing essential aspects that > come up frequently and can be discounted if that is the aim of the > ranking system. Has anyone done a primitive ranking based solely on > external outbound link frequency across a few linked databases (other > than the Bio2RDF data that is detailed on the link below which comes > out with reasonable results for a combined ranking based on a combined > value for inbound,outbound,and database overall size)? > > > > I wouldn't use this raw measure for a ranking system. I am just > saying > that the statistics for outbound links published by the different LOD > datasets so far are not consistent, that's all. Where are these statistics published so far btw? I haven't come across them before. Cheers, Peter
Received on Wednesday, 6 August 2008 09:52:23 UTC