Re: Wordnet Planets SPARQL Puzzle

>
> If you have a dataset fix for Danny's problems (or any others you've
> stumbled across along the way) do share via a URL.


Well, the "problems" in Danny's case were these:

- the required query path to connect gods to planets was non-obvious and not
trivial to figure out by exploring
- doing negation in SPARQL 1.0 is clumsy
- the wordnet dataset lacked identification of actual planets

I solved the first problem by just poking around patiently. This kind of
thing is easier and faster in Needle because the Needle explorer UI is
configurable by the user, and can be extended by calculated fields. It might
be interesting to load the Wordnet data into Needle. I haven't done that
yet, and it's bigger than the limits on our free personal accounts, but if
anybody wants to try it, let me know and I'll see if we can set up an
account with higher limits for you.

Negation is definitely better in SPARQL 1.1 than 1.0, so the obvious
"solution" there is upgrading the server behind the wordnet dataset. The
query would be simpler in Thread, but that's a different topic.**

As for the actual-planet thing, what you really want there is some shared
identifiers. Rob's query used one dataset's strings as parts of another
dataset's identifiers, which is a hopeful approach. I see that dbpedia has
links to opencyc IDs, and wordnet has links to an alternate wordnet URI set
hosted at w3c.org, so maybe there's a link we could find by following those
two chains further.

Absent that, Needle's answer is to support human curation of the data, so
we'd pull in both sets, cluster them for you, and let you confirm or reject
the matches. I don't know what the administrative tools for the wordnet
dataset look like, but I think their RDF version is an export, not the
native form of the data, so there's no real comparison to be made there.


**For the interested, the single-domain SPARQL query was this:


PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wn:    <http://www.w3.org/2006/03/wn/wn20/schema/>
PREFIX id:    <http://wordnet.rkbexplorer.com/id/>

SELECT DISTINCT ?planet WHERE {
  ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
  ?s1 rdfs:label ?planet .
  OPTIONAL {
    ?s1 wn:containsWordSense ?ws1 .
    ?ws1 wn:word ?w .
    ?ws2 wn:word ?w .
    ?s2 wn:containsWordSense ?ws2 .
    ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
  }
  FILTER (!bound(?s2))
}

and in SPARQL 1.1 it could be simplified to (I think):


PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wn:    <http://www.w3.org/2006/03/wn/wn20/schema/>
PREFIX id:    <http://wordnet.rkbexplorer.com/id/>

SELECT DISTINCT ?planet WHERE {
  ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
  ?s1 rdfs:label ?planet .
  MINUS {
    ?s1 wn:containsWordSense ?ws1 .
    ?ws1 wn:word ?w .
    ?ws2 wn:word ?w .
    ?s2 wn:containsWordSense ?ws2 .
    ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
  }
}

where in Needle this same basic query idea might be done like this:

Synset:Solar System:!(.Hyponym.Sense.Word.Sense.Synset.Meronym:Roman Deity)

Received on Tuesday, 12 April 2011 17:40:44 UTC