Re: CWM DBpedia Slurp Rules Weirdness from Sean B. Palmer on 2007-11-01 (public-cwm-bugs@w3.org from November 2007)

From: Sean B. Palmer <sean@miscoranda.com>
Date: Thu, 1 Nov 2007 15:41:13 +0000
To: "Tim Berners-Lee" <timbl@w3.org>
Cc: "Yosi Scharf" <syosi@mit.edu>, public-cwm-bugs@w3.org
Message-ID: <b6bb4d890711010841j2c29704cg639b01c1f256576f@mail.gmail.com>

On 10/29/07, Tim Berners-Lee <timbl@w3.org> wrote:

> Well, I don't know what else other than cwm handles nested formulae
> and also variables. Anything cwm processes must use the n3 definition
> of ?x which cwm uses.

I don't mean to survey over which level the shortcut is most commonly
used. I mean to survey what levels people are actually quantifying
over, regardless of syntax, to find the most frequently used level of
quantification.

In other words, cwm lets you quantify a variable over any formula and
the syntax for doing that depends on which formula you're quantifying
over; there's a shortcut when it's the parent. My hypothesis was that,
in fact, more people quantify things over the root formula than the
parent formula. Since cwm allows quantification over any level, we can
test my hypothesis.

Rather than argue about this endlessly, I've modified my standalone
version of notation3.py to do just that—to survey what level
universals in sub-subformulae are quantified over. So for example,
take the following n3 file:

$ cat universals.n3
@prefix : <http://example.org/#> .

# Parent Quantification

{ ?NoS ?NoP ?NoO } a :TestFormula .

{ { ?ParentS ?ParentP ?ParentO } a :ParentTest } a :TestFormula .

# Root Quantification

@forAll :RootS, :RootP, :RootO .

{ { :RootS :RootP :RootO } a :RootTest } a :TestFormula .

# Other Quantification (neither Parent nor Root)

{
 @forAll :OtherS, :OtherP, :OtherO .
 { { :OtherS :OtherP :OtherO } a :OtherTest } a :TestFormula .
} a :TestFormula .

# EOF

The variables starting "No" aren't in sub-subformulae, so we want to
ignore them. For all the rest, we want to print out where they're
quantified over. That's what the script does.

$ ./universals.py universals.n3
PARENT: ?ParentO
ROOT: ?RootO
OTHER: ?OtherP
OTHER: ?OtherS
ROOT: ?RootS
ROOT: ?RootP
PARENT: ?ParentP
PARENT: ?ParentS
OTHER: ?OtherO

So I copied all of the n3 files from SWAP CVS into their own directory
and then ran this script over them all. This is what you get in the
resulting output:

$ egrep '^ROOT' *.vars | wc -l
     364

$ egrep '^PARENT' *.vars | wc -l
     851

$ egrep '^OTHER' *.vars | wc -l
    6345

This was very surprising! Not only am I wrong about quantifying over
the root formula being the most popular kind of sub-subformula
quantification, the actual winner was OTHER—not something I expected
at all.

So it appears that by far the most popular level of quantification for
sub-subformulae is over some intermediate level of quantification, as
demonstrated in my test file:

{
 @forAll :OtherS, :OtherP, :OtherO .
 { { :OtherS :OtherP :OtherO } a :OtherTest } a :TestFormula .
} a :TestFormula .

Unfortunately this smacks of being so bizarre to me that I don't trust
my own results; I think I need to check that the survey is really
doing what I think it's doing. At any rate, I hope you understand the
principle now: the shortcut syntax should be deployed on the most
commonly used construct, and we can find out what the most commonly
used construct is by doing a descriptive survey of n3 that already
exists.

(Thanks to Jos for his feedback that Euler uses the shortcut syntax
for ROOT rather than PARENT as in cwm. Even with Tim's
counter-examples of putting brackets around rules, I'm not really
convinced that PARENT is the most intuitive approach; this is probably
because I deal with using log:includes to peek into documents far more
than I write rules-for-writing-rules.)

-- 
Sean B. Palmer, http://inamidst.com/sbp/

Received on Thursday, 1 November 2007 15:41:34 UTC