Re: paper that Jose was talking about today from Peter F. Patel-Schneider on 2015-12-16 (public-data-shapes-wg@w3.org from December 2015)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 16 Dec 2015 07:45:02 -0800
To: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg@w3.org
Message-ID: <567186FE.5020508@gmail.com>

On 12/16/2015 05:54 AM, Holger Knublauch wrote:
> 
> 
> On 16/12/2015 2:42 PM, Eric Prud'hommeaux wrote:
>> * Holger Knublauch <holger@topquadrant.com> [2015-12-16 12:33+0100]
>>> On 16/12/2015 12:25 PM, Eric Prud'hommeaux wrote:
>>>> * Holger Knublauch <holger@topquadrant.com> [2015-12-16 11:26+0100]
>>>>> The benchmarking process may be of interest yet then there is no
>>>>> need to print tables of actual numbers, which (surprise!) show that
>>>> It also shows that our performance gets much worse as we run out of
>>>> memory. We didn't adjust our implementation to deal with that because
>>>> it wouldn't be fair to optimize our code without you getting a chance
>>>> to do the same. If you want, we can help you set up with the benchmark
>>>> and spend a couple weeks optimizing. That would change the tone of the
>>>> paper from "here's a tool for benchmarking" to "here's a performance
>>>> comparison of ShEx vs. SHACL".
>>> I am personally not yet interested in optimizations while the
>>> language is unstable. It is obvious that all kinds of optimizations
>>> will be possible (for the core language) in the future, but I don't
>>> have the bandwidth to work on such things right now.
>>>
>>>>
>>>>> your ShEx implementation is 20 times faster than my current SHACL
>>>>> prototype. Of course I could make mine orders of magnitude faster by
>>>>> hard-coding the core language instead of turning them into many
>>>>> small SPARQL queries. The paper is comparing apples with oranges.
>>>> How would you propose we demonstrate that the benchmark tool runs and
>>>> returns useful results?
>>>>    run it on ShEx alone?
>>>>    run it on SHACL alone?
>>>>    add more disclaimers?
>>> As it is printed right now, a casual reader will skim through the
>>> table and see the bare numbers. It is not very clear that the SHACL
>>> implementation is a proof of concept only. I believe if the focus is
>>> on your benchmarking approach then you could simply compare the
>>> various ShEx implementations, and not make this appear a
>>> SHACL-vs-ShEx bake-off?
>> I'm happy to do that, but I'd like your permission to point to this
>> email to include a line like "upon request from the author, we're not
>> including results for the SHACL proof-of-concept implementation." Does
>> that work for you?
> 
> That would be OK. Better might be something along the lines of: "At the time
> of writing the TopBraid SHACL implementation was merely a proof-of-concept
> that was not at all optimized for performance and therefore did not make sense
> to be included in this comparison."
> 
> Holger
> 
> 

Certainly it makes no sense to include timings for this implementation of
SHACL without some analysis of the results.  The actual numbers are very
surprising - roughly a 5 second time for a very small problem and then less
time for larger problems.  Without some digging into reasons the results are
useless.

peter

Received on Wednesday, 16 December 2015 15:45:35 UTC