Re: Wordnet Planets SPARQL Puzzle from Kingsley Idehen on 2011-04-12 (public-lod@w3.org from April 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 12 Apr 2011 14:10:42 -0400
To: glenn mcdonald <glenn@furia.com>
CC: public-lod@w3.org
Message-ID: <4DA495A2.7030108@openlinksw.com>
On 4/12/11 1:39 PM, glenn mcdonald wrote:
>
>     If you have a dataset fix for Danny's problems (or any others
>     you've stumbled across along the way) do share via a URL.
>
>
> Well, the "problems" in Danny's case were these:
>
> - the required query path to connect gods to planets was non-obvious 
> and not trivial to figure out by exploring
> - doing negation in SPARQL 1.0 is clumsy
> - the wordnet dataset lacked identification of actual planets
>
> I solved the first problem by just poking around patiently. This kind 
> of thing is easier and faster in Needle because the Needle explorer UI 
> is configurable by the user, and can be extended by calculated fields. 
> It might be interesting to load the Wordnet data into Needle. I 
> haven't done that yet, and it's bigger than the limits on our free 
> personal accounts, but if anybody wants to try it, let me know and 
> I'll see if we can set up an account with higher limits for you.
>
> Negation is definitely better in SPARQL 1.1 than 1.0, so the obvious 
> "solution" there is upgrading the server behind the wordnet dataset. 
> The query would be simpler in Thread, but that's a different topic.**
>
> As for the actual-planet thing, what you really want there is some 
> shared identifiers. Rob's query used one dataset's strings as parts of 
> another dataset's identifiers, which is a hopeful approach. I see that 
> dbpedia has links to opencyc IDs, and wordnet has links to an 
> alternate wordnet URI set hosted at w3c.org <http://w3c.org>, so maybe 
> there's a link we could find by following those two chains further.
>
> Absent that, Needle's answer is to support human curation of the data, 
> so we'd pull in both sets, cluster them for you, and let you confirm 
> or reject the matches. I don't know what the administrative tools for 
> the wordnet dataset look like, but I think their RDF version is an 
> export, not the native form of the data, so there's no real comparison 
> to be made there.
>
>
> **For the interested, the single-domain SPARQL query was this:
>
> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
> PREFIX wn:<http://www.w3.org/2006/03/wn/wn20/schema/>
> PREFIX id:<http://wordnet.rkbexplorer.com/id/>
>
> SELECT DISTINCT ?planet WHERE {
>    ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
>    ?s1 rdfs:label ?planet .
>    OPTIONAL {
>      ?s1 wn:containsWordSense ?ws1 .
>      ?ws1 wn:word ?w .
>      ?ws2 wn:word ?w .
>      ?s2 wn:containsWordSense ?ws2 .
>      ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
>    }
>    FILTER (!bound(?s2))
> }
> and in SPARQL 1.1 it could be simplified to (I think):
> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
> PREFIX wn:<http://www.w3.org/2006/03/wn/wn20/schema/>
> PREFIX id:<http://wordnet.rkbexplorer.com/id/>
>
> SELECT DISTINCT ?planet WHERE {
>    ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
>    ?s1 rdfs:label ?planet .
>    MINUS {
>      ?s1 wn:containsWordSense ?ws1 .
>      ?ws1 wn:word ?w .
>      ?ws2 wn:word ?w .
>      ?s2 wn:containsWordSense ?ws2 .
>      ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
>    }
> }
> where in Needle this same basic query idea might be done like this:
> Synset:Solar System:!(.Hyponym.Sense.Word.Sense.Synset.Meronym:Roman 
> Deity)
>
>
Glenn,

Great!

We've achieved something here. You've shared your solution to a problem :-)

Important note to others:

Glenn and I aren't strangers, we've had these debates (sometimes heated) 
repeatedly in the past. The bridge I seek to cross with Glenn simply 
boils down to encouraging more of what he's done here (actual thread and 
this particular post) i.e., spot a problem and provide a solution that's 
ultimately a contribution to the general pool. That (IMHO) is 
exponentially better than shooting down the efforts of others at first 
blush - intentionally or inadvertently.


Glenn: I am 100% in agreement with "human curration" I just refer to it 
as conversation about the data that becomes part of the data. Basically, 
doing today's Wikipedia dance as part of the provenance aspect of a 
given data space. In a different thread it why I said: we ultimately 
want to be able to better discern the "why" dimension of a "who", 
"what", "when", and "where" better than we can today, we'll never figure 
out "why" 100% but > 0% is valuable in of itself etc..

The subjectivity inherent in data quality is why we ultimately have to 
discuss our way to the construction of "context lenses". All of this can 
happen in Linked Data form. No need for any Data Silos. Named Graphs, 
Named Rules, and the ability to calibrate context via combination of 
reasoning and inference rules are integral components of the Linked Data 
mission, at least that what I see via my subjective "context lenses"  :-)

Links:

1. http://lod.openlinksw.com/c/CV5SCWN -- your SPARQL query
2. http://lod.openlinksw.com/c/CYOT3KC -- SPARQL 1.1 variant
3. http://lod.openlinksw.com/c/CYGCJVN - DESCRIBE (using this via raw 
/sparql endpoint will produce a graph in format of your choice).

We also have a linkset in the making that would simplify this quest next 
time around.

That's what I call conversing about data that leads to subjectively make 
better data from subjectively bad or problematic data.

-- 

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web:http://www.openlinksw.com
Weblog:http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Tuesday, 12 April 2011 18:11:05 UTC