Re: Propagation of bad sameAs statements

Hugh,

Great to understand how this all works. I'm now expecting somebody to take
all these sameAs links and run some type of page rank algorithm and rank
what actually is sameAs.

Cheers

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Thu, Sep 9, 2010 at 8:23 AM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:

> Hi,
> Thank you for your interest.
> Here are some sort of answers to this and other questions.
> In fact, this has become something of a dialogue with myself :-)
>
> sameas.org does not itself do any interesting inference, other than
> A sameas B & B sameas C => A sameas C when asked about A.
> It aims to gather equivalence information from existing sources and service
> the results in a convenient (single) place.
> (It also aims to address the problem of owl:sameas being a pairwise
> statement, which gives an unpleasant explosion (n**2) of statements for
> groups of equivalences, which can be quite hard to handle.)
>
> Who chooses what data is acceptable?
> Er, me.
> I look at it and decide.
>
> Is it a spider (people sometimes ask this)?
> No - when I am bored with the other things I am doing I add more to it, by
> downloading dumps or querying SPARQL endpoints, often as a result of
> messages on this and other lists.
>
> Is owl:sameAs the only predicate recognised?
> As you have worked out, no.
> It is a service giving equivalent URIs, and one of the formats you can get
> back is owl:sameAs. But you can get other formats if you want. So the
> inputs
> include things like skos:exactMatch and skos:closeMatch (as I recall).
> And we could output other formats such as these if asked.
> At the moment we only do rdf+xml, text/n3, application/json, text/plain,
> see
> http://www.sameas.org/about.php.
> What has now been noticed is that I decided that dbpedia redirects should
> be
> treated as equivalent.
> The reason I did this is that it meant that a lot of expected URIs now
> worked.
> Eg http://dbpedia.org/resource/UN/LOCODE:GBLON and
> even http://dbpedia.org/resource/Capital_of_the_UK get to
> http://data.ordnancesurvey.co.uk/id/7000000000041428 and
> http://statistics.data.gov.uk/id/eer/07.
> The downside is that there is quite a lot of cruft in the redirects, and so
> some strange things happen (as has been observed).
>
> Do I know about errors in sameas.org?
> Yes.
> I like the Iron Maiden one to opencyc, for example.
> But I don't aim to correct these, any more than Google aims to correct
> things it links to.
>
> Why such a liberal attitude to equivalence?
> I eventually worked out that sameas.org was a discovery service.
> We have other sameas services, called crs services, on our systems (eg
> http://opencyc.rkbexplorer.com/crs/ is an external one) which are
> definitional (I hesitate to use a word like authoritative, with all its
> other connotations).
> And so in that vein, I have cast the net wider for sameas.org.
> This was the case early in its life, as the wordnet equivalence to dbpedia
> is in fact the equivalence of the word to the thing, which is wrong at
> some/any level.
> But I have taken the view that people/agents that come to sameas.org are
> looking for things, and might not care about such subtleties, not least
> because they may not have understand them when they constructed their RDF.
>
> If I had the time/funding, I would provide other services that took
> different views of equivalence, in terms of discovery/definitional or
> liberal/conservative (precision/recall is another way of saying that).
>
> Mind you it is probably the case that the sameas.org data is no worse than
> a
> lot of the data in the LOD diagram, in terms of reliably identifying
> resources, as I have rejected a bunch of them as being substandard.
>
> On 08/09/2010 15:42, "joel sachs" <jsachs@csee.umbc.edu> wrote:
>
> >
> ...
> > So, a request for the sameas.org folks: Would it be possible to include
> a
> > provenance column for all sameAs assertions you keep track of?  In cases
> > where the sameAs assertion isn't actually asserted on the web, you could
> > indicate the provenance as "inferred" in the provenance column. Also,
> have
> > you published the heuristics you use (if any) to infer sameAs relations?
> >
> ...
> >
> > Thanks!
> > Joel.
> >
> >
> >
> So finally getting round to your specific question (although hopefully the
> other stuff has also helped).
> It would be hard to provide the extra column for quite a few reasons.
> We do know where we got the data from, but it may be a SPARQL endpoint, a
> dump downloaded, or an email sent to me, for examples. So it would not be
> very easy to interpret.
> But only a small number of the pairs would be so identified, as all the
> rest
> are inferred from the other pairwise assertions.
> We can actually have our own visualisation tools for bundles, with
> assertions and dates, etc, but the tool is hard to read if you don't know
> what is happening, and...
> 1) Finding the resources to make it more accessible would be hard.
> sameas.org has effectively never been funded - it is my hobby with Ian
> Millard, and we would love to have the resources to do this sort of stuff.
> I actually have plans for a more sophisticated architecture behind
> sameas.org which facilitate this and a lot of other stuff, but again it is
> a
> question of resources.
>
> 2) What is the Ontology?
> A big question with giving more information is, what is the ontology?
> We live in the Linked Data world (for sameas.org), and
> machine-interpretable
> structures.
> So sameas.org is designed to be used by services, and the ontology of
> provenance (and trust) is still an open question.
> So it might be that if you can tell us the ontology for provenance that
> could be used, we might be able to add something to the service.
>
> 3) Simple services
> I am a great believer in things that do a small number of things simply,
> and
> (hopefully) do them well.
> I don't yet understand how to keep the simplicity of sameas.org, while
> offering more sophisticated facilities to users.
>
> Oh dear, that went on a while, but hopefully it has addressed a lot of the
> questions, asked and unasked.
>
> I've just remembered there is a blog, so I will put this message there as
> well:
> http://www.rkbexplorer.com/blog/?p=40
>
> Best
> Hugh Glaser
>
>
>
>

Received on Thursday, 9 September 2010 14:28:49 UTC