Re: How to proceed wrt publishing solutions? from Charles Petrie on 2008-02-25 (public-xg-swsc@w3.org from February 2008)

From: Charles Petrie <petrie@stanford.edu>
Date: Mon, 25 Feb 2008 15:25:43 -0800
To: member-xg-swsc@w3.org, public-xg-swsc@w3.org
CC:
Message-Id: <200802252325.m1PNPhvA031646@cdr.stanford.edu>
 I was out with a serious injury for a while. The good news is
that while convelescing, I can re-read everything. I'd like to 
restart this discussion about evaluation methodology.

 We tried out the beginning of the surprise methodology at the last
workshop at Stanford. We had a new scenario and, for the participants
who achieved it, we gave them some variations on the scenario to
program overnight.  Those who succeeded, by passing the correct
messages to the infrastructure, got a "plus" by their check
mark. This, along with code evaluation to assure participants of the
validity of the code, was our essential software engineering
evaluation, and most of the discussion here is elaborate on that
process to make it as good as possible.

 We want to make sure that we are doing good science so something like
the repeatability requirements of SIGMOD at
http://www.sigmod08.org/sigmod_research.shtml are certainly
relevant. Since we (the SWSC) determine the common problems, we don't
need to repeat completely the code execution to verify it. However, we
probably do need to ensure some sort of repeatability. My proposal now
is that we do not require uploading of solutions prior to the
workshop. However, then, solving the problem only gets the participant
a check mark, that says that *somehow*, they can solve that scenario .

 In order to get a "plus mark", they must be able to solve specific
variations of the scenario within a few days, determined by the timing
of the workshop organizers.  And, though this may not be
representative of many real-world problems, we require that the
solutions be backward compatible to the original problem. This is not
only verified by testing at the workshop, but also by consensus
inspection of the code. This requirement lessens the importance of
consensus code inspection. And it emphaizes the value of software that
works based upon high-level specification of the goals to be achieved
under constraints. It is the job of the scenario variance specificier
to ensure that backward compatibility is indeed possible.

Up to now, I have only addressed evaluation.

However, we don't want only to roughly evaluate the SE aspect of
solutions. We would like to evolve both an understanding of the
various technologies and also to encourage re-use of the them,
building toward "best practices" where ever possible. This is where
the challenge so far has been weakest.  I think in the future we will
have to put some *mandatory* restrictions on participation, as
follows.

Participants must furnish documentation on their solutions, beyond
a simple paper, that has the following qualities:

 1) A common machine readable format such as XML, KIF, SAWSDL, or something
  for which there exists available free parsers is preferable. The
  minimum requirement is a BNF for any format.

 2) The content should be useful for solving the problem(s). An
  example would be an ontology for time that enables different
  problems to be solved with a minimal change. It might be 
  a way of annotating the WSDL so that a problem solver can 
  work with it more automatically.  This will vary with the
  scenario. 

 3) The content should illustrate some principles of the approach
  that others can evaluate the utility of it, perhaps deciding
  to adapt and re-use it.

 4) The content should understandable without having to install
  a large system, but it is desirable that there be a pointer
  to a working system that will consume the content. The more
  information about the working context of the system, better.

Such documentation should be furnished prior to the workshop and
should be discussed at the workshop with consensus on its efficacy.
Participants who do not furnish documentation considered useful
by the consensus will recieve a minus mark, indicating such,
in their evaluation.

Are we getting close to a methodology?

Charles
-- 
http://www-cdr.stanford.edu/~petrie
Received on Monday, 25 February 2008 23:25:53 UTC