Re: Where to put the knowledge you add

I agree with this entirely, and it's why I keep insisting that for most
purposes datasets should be expressed using local identifiers, with all
external linkages called out explicitly and/or externally. owl:sameAs and
the use of other people's identifiers for your own nodes are equally
dangerous. If I'm asserting that Brussels is the capital of Belgium, I'm
saying that my notion of Brussels is my notion of "capital" of my notion of
Belgium. I am the authority for that assertion. Saying that my notion of
Brussels, "capital" or Belgium correspond with anybody else's notion of
anything are separate assertions, for which I do not have the same
authority.

For that matter, the proper interpretation of "correspond" depends on the
purpose: for some things, treating "correspond" as owl:sameAs may be exactly
right, and for some it might be utterly unacceptable. And it's much easier
to map a "corresponds" property to owl:sameAs if you want to than to rewrite
an entire dataset to undo the misapplication of IDs or owl:sameAs.

Think global, assert local.

glenn


On Wed, Oct 12, 2011 at 7:55 AM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:

>
> Hi.
>
> I have argued for a long time that the linkage data (in particular
> owl:sameAs and similar links) should not usually be mixed with the knowledge
> being published.
>
> Thus, for example as I discussed with Evan for the NYTimes site a while
> ago, it is not a good thing to put the owl:sameAs links (which were produced
> by a relatively unskilled individual over a short period of time) at the
> same status as the other data, which has been curated over decades by expert
> reporters.
>
> These sameAs links have potentially very different trust,  provenance,
> licence, and possibly other non-functional attributes from the substantive
> data.
> Clearly they have different trust and provenance, but licence may well be
> different, as the NYT may want people to take the triples away to bring
> traffic to their site, while keeping the other triples under more restricted
> licence.
>
> Which brings me to an example of where things have recently gone badly
> wrong.
> I have reported a bug to the dbpedia team wherein the URIs for countries
> have become deeply intertwingled.
> Example queries are at the end of this message - they have to explicitly do
> the owl:sameAs because the store does not do owl:sameAs inference, but the
> outcome is that I can validly infer answers such as "Maseru is the capital
> of Belgium".
>
> Of course, mistakes happen, so I am not having a specific go at dbpedia,
> which I still think is wonderful.
>
> But the outcome is that I get very bad data from dbpedia.org unexpectedly,
> which means I (and presumably anyone else) can't reliably use dbpedia.orgat all (because I use an inference engine when I cache the data).
> Had the dbpedia.org site simply stuck to the behaviour I was sort of
> expecting of publishing data from wikipedia (possibly publishing the linkage
> data elsewhere) I would have been in a better position.
>
> One of the issues here is to realise when we are actually adding knowledge
> to a triplication process.
> It is clear when things like owl:sameAs are added that knowledge is being
> added.
> However, people probably consider it less clear if URIs from dbpedia or
> elsewhere are directly used that they are adding their own knowledge.
> In a similar way, such use introduces knowledge which may have very
> different trust and provenance from the data being triplified.
>
> Is this a good way to do things?
>
> I would say not.
> I have used a wide variety of Linked Data sources, and have found problems
> with almost every one of them (possibly every significant one).
> The problems frequently relate to the extra knowledge that the triplication
> process has introduced.
> If only I could be given the data without, then I would not have to reject
> the dataset.
>
> Thanks for reading this far.
> Best
> Hugh
>
> Query:
> SELECT DISTINCT ?capital WHERE {
>  ?s owl:sameAs <http://dbpedia.org/resource/Belgium> .
>  ?s owl:sameAs ?country .
>  ?country <http://dbpedia.org/ontology/capital> ?capital .
> }
>
> As a URI:
>
> http://dbpedia.org/snorql/?query=SELECT+DISTINCT+%3Fcapital+WHERE+%7B%0D%0A+%3Fs+owl%3AsameAs+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FBelgium%3E+.%0D%0A+%3Fs+owl%3AsameAs+%3Fcountry+.%0D%0A+%3Fcountry+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2Fcapital%3E+%3Fcapital+.%0D%0A%7D%0D%0A
>
> Output:
> capital
> http://dbpedia.org/resource/City_of_Brussels
> http://dbpedia.org/resource/Maseru
>
>
> --
> Hugh Glaser,
>              Web and Internet Science
>              Electronics and Computer Science,
>              University of Southampton,
>              Southampton SO17 1BJ
> Work: +44 23 8059 3670, Fax: +44 23 8059 3045
> Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
> http://www.ecs.soton.ac.uk/~hg/
>
>
>

Received on Wednesday, 12 October 2011 12:50:20 UTC