Another look at cwm performance from Yosi Scharf on 2005-08-29 (public-cwm-talk@w3.org from July to September 2005)

From: Yosi Scharf <syosi@MIT.EDU>
Date: Mon, 29 Aug 2005 15:56:02 -0400
To: public-cwm-talk@w3.org
Message-ID: <43136852.6030803@mit.edu>
I was playing with cwm pychinko intergration over the weekend, getting
cwm to use the rete much more directly. I then ran some tests, and saw
little performance change. This fascinated me. I then looked at the
pychinko tests:

-----
initial size: 1200
Testing  Rules that reuse the same variables on left sides
CWM COMMAND:  time /home/syosi/CVS-local/WWW/2000/10/swap/cwm.py
generatedtests/testfacts.1200.n3 --ntriples
--think=rules/sameVarRules.n3 --base=http://www.mindswap.org/~katz/
--purge > generatedtests/testoutput.cwm.1200.n3
408  inferred fact(s)
Pychinko time: 3.7090420723
CWM time:

real    0m7.027s
user    0m6.705s
sys     0m0.091s
-------

Here we see, pychinko took under 4 seconds on this moderately large fact
base, and cwm took twice as long. I ran the numbers myself:

------
syosi@mr-burns:~/pychinko/pychinko$ time
/home/syosi/CVS-local/WWW/2000/10/swap/cwm.py
generatedtests/testfacts.1200.n3 --ntriples
--think=rules/sameVarRules.n3 --base=http://www.mindswap.org/~katz/
--purge > generatedtests/testoutput.cwm.1200.n3


real    0m7.135s
user    0m6.624s
sys     0m0.104s
syosi@mr-burns:~/pychinko/pychinko$
syosi@mr-burns:~/pychinko/pychinko$ time
/home/syosi/CVS-local/WWW/2000/10/swap/cwm.py
generatedtests/testfacts.1200.n3 --no --think=rules/sameVarRules.n3
--base=http://www.mindswap.org/~katz/ --purge >
generatedtests/testoutput.cwm.1200.n3

real    0m3.892s
user    0m3.596s
sys     0m0.077s
syosi@mr-burns:~/pychinko/pychinko$ time python2.4 main.py
--facts=generatedtests/testfacts.1200.n3 --rules=rules/sameVarRules.n3
408  inferred fact(s)

real    0m4.667s
user    0m4.266s
sys     0m0.100s
syosi@mr-burns:~/pychinko/pychinko$ time
/home/syosi/CVS-local/WWW/2000/10/swap/cwm.py
generatedtests/testfacts.1200.n3 --think=rules/sameVarRules.n3
--base=http://www.mindswap.org/~katz/ --purge >
generatedtests/testoutput.cwm.1200.n3

real    0m7.482s
user    0m6.984s
sys     0m0.112s
-----


basically, what this shows is that the large slowdown associated with
cwm in many of these simple rule tests has nothing to do with cwm's
reasoner being slow. For these simple rule sets, it may not be. Cwm's
outputter is very slow, even in ntriples mode. I then told cwm not to
sort the output:

-------
syosi@mr-burns:~/pychinko/pychinko$ time
/home/syosi/CVS-local/WWW/2000/10/swap/cwm.py
generatedtests/testfacts.1200.n3 --ntriples
--think=rules/sameVarRules.n3 --base=http://www.mindswap.org/~katz/
--purge > generatedtests/testoutput.cwm.1200.n3 --ugly

real    0m5.139s
user    0m4.831s
sys     0m0.087s
--------

As you can see, two seconds were saved simply not sorting the output,
and cwm now compares much better to pychinko. Note that cwm is still
doing lots of useless work when outputting ntriples, things like
figuring out good prefixes are really not necessary.


I am by no means saying the cwm's reasoner is perfect, fast, or a long
term answer to anything. I'm just pointing out that many problems
actually lay elsewhere, and that pretty printing is almost certainly the
worst performing part of cwm in many simple cases.

Yosi
Received on Monday, 29 August 2005 20:11:26 UTC