Re: paper that Jose was talking about today from Eric Prud'hommeaux on 2015-12-16 (public-data-shapes-wg@w3.org from December 2015)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 16 Dec 2015 06:25:18 -0500
To: Holger Knublauch <holger@topquadrant.com>
Cc: public-data-shapes-wg@w3.org
Message-ID: <20151216112516.GG19479@w3.org>
* Holger Knublauch <holger@topquadrant.com> [2015-12-16 11:26+0100]
> The benchmarking process may be of interest yet then there is no
> need to print tables of actual numbers, which (surprise!) show that

It also shows that our performance gets much worse as we run out of
memory. We didn't adjust our implementation to deal with that because
it wouldn't be fair to optimize our code without you getting a chance
to do the same. If you want, we can help you set up with the benchmark
and spend a couple weeks optimizing. That would change the tone of the
paper from "here's a tool for benchmarking" to "here's a performance
comparison of ShEx vs. SHACL".


> your ShEx implementation is 20 times faster than my current SHACL
> prototype. Of course I could make mine orders of magnitude faster by
> hard-coding the core language instead of turning them into many
> small SPARQL queries. The paper is comparing apples with oranges.

How would you propose we demonstrate that the benchmark tool runs and
returns useful results?
  run it on ShEx alone?
  run it on SHACL alone?
  add more disclaimers?

I think most reviewers would not be content to see it demonstrated on
only one if they knew we could have done a preliminary comparison. Do
you have a strategy which gets the tool out there without being
unfair to either group?


> Anyway, you are free to publish whatever you want, like we are free
> to post our own propaganda on the web. I find it just deplorable
> that this is labeled academic research.
> 
> Holger
> 
> 
> On 16/12/2015 10:56 AM, Eric Prud'hommeaux wrote:
> >* Holger Knublauch <holger@topquadrant.com> [2015-12-16 10:25+0100]
> >>I am disappointed that this paper includes performance comparisons
> >>including my experimental prototype. I made it very clear that the
> >>current implementation is not optimized at all, so what is the point
> >>of printing this? The academic value of this comparison is nil.
> >The paper explicitly states that it was testing the validator on
> >"preliminary implementations":
> >
> >Abstract: [[
> >   We then performed some preliminary experiments comparing performance
> >   of two validation engines based on Shape Expressions and SHACL
> >   respectively against the proposed benchmark.
> >]]
> >
> >Section 8: [[
> >   While these results are calculated using early betas of both SHACL and
> >   ShEx, it demonstrates how wiGen can be used for evaluation of
> >   validation tools and algorithms.
> >]]
> >
> >It goes on to describe how the tool hilights negative performace
> >behavior of ShEx (not SHACL) [[
> >   The results show for instance that the ShEx implementation's
> >   calculation time grows considerably when validating many Datasets,
> >   perhaps because the shape is recursive (validating Datasets requires
> >   validating Observations which in turn validates other Datasets).
> >]]
> >
> >and further describes how the tool will be useful [[
> >   The wiGen tool can be scripted to explore many relevent parameters:
> >   size of the validation graph, number of nodes to be validated,
> >   interrelations between nodes in recursive shapes.  This will permit
> >   principled desing choices in language developement and tool selection
> >   and ultimately contribute to improved quality in Linked Data.
> >]]
> >
> >concluding with one more emphasis that these are preliminary results [[
> >   The take-home message from this very preliminary evaluation is that,
> >   while the performance figures still leave much to be desired,
> >   reasonable performance is definitely reachable using either the ShEx
> >   or SHACL approach.
> >]]
> >
> >It seems very clear from this that a benchmarking tool is useful,
> >both to users and developers, and that waiting for the specs and
> >implementations to be done simply deprives the community of
> >valuable input. This tool is an opportunity, not a threat.
> >
> >
> >>Holger
> >>
> >>
> >>On 3/12/2015 10:07 PM, Peter F. Patel-Schneider wrote:
> >>>The paper that Jose was talking about today is submitted to Semantic Web –
> >>>Interoperability, Usability, Applicability and can be found on their "Under
> >>>Review" page http://www.semantic-web-journal.net/underreview
> >>>
> >>>Look for Validating and Describing Linked Data Portals using Shapes
> >>>
> >>>
> >>>peter
> >>>
> >>
> 
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Wednesday, 16 December 2015 11:25:30 UTC