Bug 001: Unclear Schema

This is feedback on a Last Call Working Draft:

Evaluation and Report Language (EARL) 1.0 Schema
W3C Working Draft 10 May 2011
http://www.w3.org/TR/2011/WD-EARL10-Schema-20110510/

The schema is unclear. People should be informed about the constructs
within an application in proportion to their importance within the
language, which is predicated on things such as frequency of expected
use. This is not well reflected in the present specification.

There is at least a modicum of information available to orient the
user in this regard. For example, the classes are documented in this
order:

2.1. Assertion Class
2.2. Assertor Class
2.3. TestSubject Class
2.4. TestCriterion Class
2.5. TestResult Class

Which is sensible and logical in terms of importance. An Assertion is
at the root of the hierarchy, for example, and at least one must
appear in every EARL report, according to the Developer's Guide. But
for example, it is not even made clear that having Assertion be the
first class described is because it's at the root of the hierarchy.
What we find is:

"Every test result in EARL is expressed as an assertion."

Whereas we could instead have read something like:

"Every EARL report consists of one or more assertions, which are
represented as instances of an earl:Assertion class. These are the
most basic elements of the language, forming the atomic components
with which all other data is associated."

This could be followed by an explanation of earl:Assertion forming the
subject-most nodes in a directed labelled graph system, that all
vertices point away from them and none come into them.

Again, this is the rationale behind earl:Assertion being documented
first in the list of classes. This is a lot of rationale for what
might seem like a simple specification design decision. This is
obvious to somebody who knows the language, but to somebody reading
the specification for the first time, this is not clear at all.

In connexion with this, a more serious bug, indeed the only egregious
one that I've found, occurs in the Developer's Guide. It is the fact
that important schematic information such as how many earl:test
properties must be associated with earl:Assertion instances is only
contained, albeit normatively, in the Developer's Guide. This should
be in the schema and schema specification instead. OWL, for example,
has machinery for expressing cardinality of properties, and yet this
information is not contained in the schema or in the schema
specification.

But that is for the Developer's Guide: this is relevant here in so
much as this information not only needs to be moved to the schema
specification, but to be exposited clearly therein. A specification
must detail as unambiguously and clearly as possible the technology
that it defines. Why not, for example, use diagrams such as the
following?

http://sbp.so/earl/diagram

This is what I drew in order to help me understand the language
myself. I generated this from the following documentation:

http://sbp.so/earl/schema

Which I also generated myself using an old tool I wrote called Schemadoc:

http://inamidst.com/proj/sdoc/
https://gist.github.com/964888

If I'm having to use such tools just to understand the structure of
the language that the schema specification is defining, then the
schema specification isn't doing its job. I think a diagrammatical
overview of the language like this would help to solve a lot of
issues, but there is no reason why more diagrams and other
expostulatory techniques can't be used for the individual components.
For example, each class could contain a zoomed in cropped section of a
main schema diagram (properly drawn in SVG, not hand drawn as with
mine) with that particular class highlighted. Of course a more
standard form should be used, perhaps UML or whatever is trendy these
days, but I hope the principle is clear.

Another standard expostulatory technique is to include lots of
examples, especially examples of each part of the system, perhaps in
side by side SVG, N3, and EARL JSON serialisations. You know how some
people define a system entirely in unit tests? Well EARL is an
evaluation language, so it would be rather fitting if this kind of
technique were used here; eating your own dog food, as ERT WG member
William Loughborough was fond of reminding us all. Instances should be
shown to have a mechanically valid relation to the schema that they
are representing as examples. Instance data is very useful for showing
the shape of a schema, just like a statue is good at showing the shape
of the mould that it came from.

This bug is one of my priorities, and I have not so much concentrated
on specific changes required of the text because this is one of the
more root and branch bug reports that I intend to file. Many things
need changing in order to make the specification clearer.

This issue should not be taken lightly, or shrugged off. The GRDDL
specification, for example, was a work of genius in the strictness of
its logic and its careful exposition. But when I came to implement it,
I found bugs and unclear sections. Even the most carefully written
specification can be quite bad.

So it would be a mistake to believe that I am asking for tutorial
material here. I am asking rather for specification material. A
specification is like a legal document, it gives you the rights and
wrongs. If there is ambiguity, then a specification fails. The
ambiguity can be logical, such as leaving out how many times earl:test
should be used, or it can be subjective, such as not saying that
earl:Assertion is the root of the hierarchy. Both of these
deficiencies of exposition are of the same species, and both cause the
same kind of implementation errors and subsequent debates about what a
specification really meant or the authors intended.

-- 
Sean B. Palmer, http://inamidst.com/sbp/

Received on Tuesday, 10 May 2011 21:45:21 UTC