Re: Visualizing LOD Linkage

----- "Yves Raimond" <yves.raimond@gmail.com> wrote:

> Hello!
> 
> >> Hello!
> >>
> >> >
> >> > It depends on whether you know that the external references are
> >> distinct just based on the URI string. If someone links out to
> >> multiple formats using external resource links then they would have
> to
> >> be counted as multiple links as you have no way of knowing that
> they
> >> are different, except in the case where you reserve these types of
> >> links as RDF literals.
> >> >
> >>
> >> I am not sure if I interpret it correctly - do you mean that you
> >> could
> >> link to two URIs which are in fact sameAs in the target dataset?
> >> Indeed, in that case, the measure would be slightly higher than
> what
> >> it should be. However, I would think that it is rarely (if not
> never)
> >> the case.
> >
> > I personally don't put sameAs on URI's which relate to the same
> thing but are really just different representations, ie, the HTML
> version doesn't get sameAs the RDF version.
> 
> Oh god, I wasn't suggesting that *at all*!!! Replace "outbound links"
> to "outbound links to *things* in other LOD datasets" in what I said
> before, if that makes it clearer.

Its all good! I understand what you mean now. :)

I am not sure is my only answer to that one.

> >
> >> > I think a more effective way is to measure both the number of
> >> outbound and inbound links, something which is only possible if
> you
> >> have a way of determining in the rest of the web who links back to
> a
> >> particular resource in your own scheme. If you count inbound
> links,
> >> like Google do with PageRank, then you get a better idea of who is
> >> actually being used as opposed to who is just linking into all the
> >> others it can find. See [1] for a description of it in the Bio2RDF
> >> project.
> >> >
> >>
> >> I think this measure of inbound links is more related to a
> >> "usefulness" metric, as highlighted by Richard. Again, I do not
> >> attempt to propose such a measure.
> >> I just want the raw number of "outbound links to other datasets"
> to
> >> be
> >> *comparable* amongst datasets.
> >>
> >
> > I am not completely sure what the value in just measuring outbound
> links is myself. As you say it is easy to manipulate, and depends
> heavily on the particular domain that the database is being used from.
> To get the equaliser I think you need to include inbound links
> somehow, or possibly scale the number of outbound links against the
> size of the database to get an equaliser that way.
> >
> 
> Yes, I would personally go for the latter. But still, you need
> something to normalise that is consistently measured across datasets.

Maybe you could rank based on the number of other databases that are linked to per record... I am only trying to come up with examples based on the way that I have seen it functioning in Bio2RDF here, which is a slightly different case to the general case.

> > Either way, you have to interactively discover these things, just
> talking about them could lead to us missing essential aspects that
> come up frequently and can be discounted if that is the aim of the
> ranking system. Has anyone done a primitive ranking based solely on
> external outbound link frequency across a few linked databases (other
> than the Bio2RDF data that is detailed on the link below which comes
> out with reasonable results for a combined ranking based on a combined
> value for inbound,outbound,and database overall size)?
> >
> 
> I wouldn't use this raw measure for a ranking system. I am just
> saying
> that the statistics for outbound links published by the different LOD
> datasets so far are not consistent, that's all.

Where are these statistics published so far btw? I haven't come across them before.

Cheers,

Peter

Received on Wednesday, 6 August 2008 09:52:23 UTC