- From: Charles Petrie <petrie@stanford.edu>
- Date: Mon, 25 Feb 2008 15:25:43 -0800
- To: member-xg-swsc@w3.org, public-xg-swsc@w3.org
- CC:
I was out with a serious injury for a while. The good news is that while convelescing, I can re-read everything. I'd like to restart this discussion about evaluation methodology. We tried out the beginning of the surprise methodology at the last workshop at Stanford. We had a new scenario and, for the participants who achieved it, we gave them some variations on the scenario to program overnight. Those who succeeded, by passing the correct messages to the infrastructure, got a "plus" by their check mark. This, along with code evaluation to assure participants of the validity of the code, was our essential software engineering evaluation, and most of the discussion here is elaborate on that process to make it as good as possible. We want to make sure that we are doing good science so something like the repeatability requirements of SIGMOD at http://www.sigmod08.org/sigmod_research.shtml are certainly relevant. Since we (the SWSC) determine the common problems, we don't need to repeat completely the code execution to verify it. However, we probably do need to ensure some sort of repeatability. My proposal now is that we do not require uploading of solutions prior to the workshop. However, then, solving the problem only gets the participant a check mark, that says that *somehow*, they can solve that scenario . In order to get a "plus mark", they must be able to solve specific variations of the scenario within a few days, determined by the timing of the workshop organizers. And, though this may not be representative of many real-world problems, we require that the solutions be backward compatible to the original problem. This is not only verified by testing at the workshop, but also by consensus inspection of the code. This requirement lessens the importance of consensus code inspection. And it emphaizes the value of software that works based upon high-level specification of the goals to be achieved under constraints. It is the job of the scenario variance specificier to ensure that backward compatibility is indeed possible. Up to now, I have only addressed evaluation. However, we don't want only to roughly evaluate the SE aspect of solutions. We would like to evolve both an understanding of the various technologies and also to encourage re-use of the them, building toward "best practices" where ever possible. This is where the challenge so far has been weakest. I think in the future we will have to put some *mandatory* restrictions on participation, as follows. Participants must furnish documentation on their solutions, beyond a simple paper, that has the following qualities: 1) A common machine readable format such as XML, KIF, SAWSDL, or something for which there exists available free parsers is preferable. The minimum requirement is a BNF for any format. 2) The content should be useful for solving the problem(s). An example would be an ontology for time that enables different problems to be solved with a minimal change. It might be a way of annotating the WSDL so that a problem solver can work with it more automatically. This will vary with the scenario. 3) The content should illustrate some principles of the approach that others can evaluate the utility of it, perhaps deciding to adapt and re-use it. 4) The content should understandable without having to install a large system, but it is desirable that there be a pointer to a working system that will consume the content. The more information about the working context of the system, better. Such documentation should be furnished prior to the workshop and should be discussed at the workshop with consensus on its efficacy. Participants who do not furnish documentation considered useful by the consensus will recieve a minus mark, indicating such, in their evaluation. Are we getting close to a methodology? Charles -- http://www-cdr.stanford.edu/~petrie
Received on Monday, 25 February 2008 23:25:53 UTC