EARL 1.0 from Sean B. Palmer on 2001-07-14 (w3c-wai-er-ig@w3.org from July 2001)

From: Sean B. Palmer <sean@mysterylights.com>
Date: Sat, 14 Jul 2001 02:42:06 +0100
To: <w3c-wai-er-ig@w3.org>
Message-ID: <046701c10c06$6f78a840$dde693c3@z5n9x1>
I wonder if EARL 1.0 could come out of EARL 0.95 by Wu Wei? I don't
mean by not doing anything to 0.95, but rather making it more natural
for people to use.

Effort is the thing that keeps any new language down. Languages are
nothing without implementations, just as natural languages are nothing
without native speakers. The problem that we face is that with limited
resources, how do we come up with a language that is incredibly easy
for people to get into, learn, use, understand, implement, develop,
contribute to, evolve, define, and so on.

One of the most important steps was defining it in RDF. Although
misunderstood by so many, RDF had the potential to let us create a
model however we wanted, and also a syntax in which people could
express the model. When N3 came along, it helped tenfold for people to
express what was going on. But is RDF really the answer? Tools like
CWM have shown that general Semantic Web processing tools can get the
EARL jobs done... but CWM is it. It's the only Semantic Web tools
available that's worth anything, and it's demonstration code.

So what alternatives are there? There's straight XML. We already know
that that is just XML with a layer of abstraction that will make it
practically impossible for us to agree on anything. Meta *models* are
more important than meta syntaxes. Syntax is a pain, but it's not
important in the design phase. The model is everything.

UML is one suggestion. This would allow the OOP programmers amongst us
to come up with class bases and code to enable people to process EARL.
It would also still link in with RDF. However, I don't really know the
first thing about implementing UML and what that entails (which should
not stop ERT from pursuing it).

Apart from that, I really don't know what else there is to implement
in. Clearly, our report format needs to be in an XML format of some
kind. It's the de facto language. The way we came up with EARL 0.95 is
that Daniel wrote a list of stuff down based on the outcome of our F2F
meeting, and after some refinements, and input from Aaron, we ended up
with an RDF Schema. But did we stop to question if the context => {
content result criteria } is the most efficient way of going about it?
Of course not, we just stuck with it and ended up with 0.95.

The design of EARL, based on the requirements (ease of use, machine
procesability, extensibility, etc.) are as follows (in note form):-

* Able to attach context to particular assertions. Contexts should be
mergable and queryable
* A standard evaluation. What does an evaluation mean without context?
You always have the triple "MyPage passes/fails SomeCriteria", but
what is this mean?

Let's go through a couple of likely scenarios:-

1) I have a page that I want to promote as being WCAG-A accessible.
Perhaps I want user agents to recognize this and let the user know in
some way
2) I've produced a new tool and want people to send in bug reports
that I can store in a database

For 1), the context is fairly important. What we need to know is if
the author of the evaluation is lying, and whether or not we can
repeat the experiment ourselves. It is useful to know who ran the test
(it should be the author of the page, and do we trust the author?). It
is useful to know exactly what form of compliance the author is
talking about. It is useful to know what time and for what content the
evaluation was made (has the page changed since then?). It is useful
to link to some online script (through GET) that will perform the same
test, if possible. I think that EARL 0.95 can provide for all of this
right now.

For 2), the context is important in a different way. The tool provider
wants to know who is making these bug reports so that they can a)
track management of the issue, b) track any fixes the assertor has
posted, c) notify the assertor that the bug has been fixed. Important
details include: author name, and context information. Not necessary
if the author wants to remain anonymous. The context information is
here being used not for *trust*, but for *feedback* and *management*.
Trust is implicit - people rarely file random bug reports, and even if
they do, there is no way to monitor that. This is out of scope for
EARL.

So, context information is important. However, in 2), we find that
context information is not required, and that it is out of scope for
EARL to consider why context should be required all of the time. IMO
only, of course.

The assertion itself (the main part of the evaluation) is something
that will rarely be disputed. In p(s, o) notation:-

   validity(content, criteria)

or, rather:-

   type(?x, validity)
        (type(?y, content),
         type(?z, criteria))

But in EARL 0.95 there are some subtle deviations from this model that
are kind of fudged. Especially the stuff on the criteria side that
doesn't really belong there - repair information and so forth. We call
the criteria side a "TestCase", but I'm not so sure what that is
anymore.

The properties to do with criteria explicitly are:-

   earl:testId
   earl:testSuite
   earl:testCriteria
   earl:suite
   earl:id
   earl:excludes
   earl:level

the pseudo junk is:-

   earl:testMode
   earl:repairInfo
   earl:purpose
   earl:operatorInstructions
   earl:reproducableStep

What does it mean to say that something passed a test mode? Is that
not in the wrong place? But, where should this information go?

These are particular properties of an assertion. The assertion had
this test mode, the assertion generated this repair information, the
assertion serves this purpose, the assertion has these operator
instructions, the assertion has this reproducable step. This makes the
assertion a first class object in itself, and greatly complicates
things w.r.t. reification and so forth. For the purposes of continuing
discussion, I am classing these properties (informally!) as assertion
result properties.

For example, it begs the question - when is an assertion the same
assertion as another? Or, more specifically, are two assertions the
same when they have the same set of assertion result properties
hanging off of them? What if one has none, and another has some?
Clearly, the assertion result properties are once again linked to the
context information for the overall evaluation itself. However, I
think this time, the connections aren't all that clear.

Let's take the "testMode" property as an example. When we run an
assertion, we say that it had a certain test mode. If the test mode
was automatic, this might infer that the test can be reproduced
cleanly. Let's say I validate my page on the W3C validator. The
assertion that the validator outputs is automatic. The assertion
should also contain a link back to the URI that it came from, so that
it can be run again, or whatever.

Is this information inherent to the assertion itself? I think not; I
would still say that it is a property of the assertion, not part of
the assertion itself. In this respect, EARL 0.95 has a bug.

I also don't think that this was a part of the context information.
The context information is one level up from the actual test itself,
it says who ran the test. Who tested this page? Can I trust that they
really tested that page? The test mode information is something which
is asserted by this context. It is not a part of the context itself.

So, we have a few parts to the evaluation model now. We have a
context. We have the assertion. And then we have properties about that
assertion, which should also be linked back to the context.

To digress slightly back to how we're going to teach EARL, I think
we're going to have to lead by example. By showing how EARL can
successfully model real-life situations and facilitate these, we can
impart the collective wisdom of EARL much better then pointing them at
some crummy class diagrams and saying "ooh, look at this wonderful
model". Tutorials of the kind "here is how to add EARL to your page"
are wonderful, and that's the type of thing we can be looking into for
EARL 1.0. For EARL 0.95, I suggest we keep on the same lines as we're
doing at the moment. We can reuse much of this for EARL 1.0 anyway.

Back to the model. I cannot help but think of the model that I've just
been through in terms of a nodes and arcs thing. I simply can't help
but see it that way! So here's a strawman:-

   evaluation =
      assertor asserts assertion .
   assertor context_properties information .
   assertion assertion_properties information .

Here, I'm trying to get rid of the anonymity that EARL gave to things.
The reification can still occur, but the statement should have an ID.
I can't think of it being any other way, now that I've thought about
things for EARL 1.0.

For example, in EARL 0.95 we said:-

:Sean :asserts { :MyPage :passes :MyTest } .

Using this strawman (which should not be referred to as EARL 1.0 -
this is an intermediary state), it would be more like:-

   :Sean :asserts :x .
   :x :s :MyPage; :p :passes; :o :MyTest .

We could even (and this is where it gets rather tricky...) fake, or
somehow sub class, reification. O.K., we're already sub classing
statements in EARL 0.95, but why not refine the reification properties
as well? This would mean that as RDF changes, we can just update the
definitions without having to change the URIs. I'm not sure it'll
work, and I'm not sure how much confusion it will create in RDF
people, but it's something worth investigating.

How do you sub class rdf:predicate? That's going to be messy.

Back to the concrete. The syntax really isn't important. What we are
developing is the *model* and the *vocabulary* people can reuse the
model without using the vocabulary (I tried this with HumanML, and it
worked quite well with very few changes!). People can reuse the
vocabulary without reusing the model. But then people use the
vocabulary and model together, that is EARL. EARL M&V? He he he :-)

Actually, that brings me back to the status of EARL. I've always
wondered just what on earth EARL is? Is it just some experiment by the
ERT WG? Then again, it's nice that the process doesn't matter. I feel
quite free to say anything, experiment anyhow, and contribute as much
or as little as I want without process getting in the way. I can say
things which would be considered quite bizarre in other circles, and
yet they're often paid serious attention, attention that is thankfully
deserved in many cases.

But I digress again. The model and vocabulary. I hope somewhat that
the model will be *repurposable*; that is to say, we shall be able to
transform it to make it grokable by humans as well as machines. XSLT
is a great tool, so we should be able to use that so long as EARL is
expressible in an XML format.

It would also be nice to investigate what other kind of evaluation
languages there are out there, what properties they use, and so on.
Maybe some of them can be transformed into EARL? That would be a good
test of independent invention.

EARL 1.0 will hopefully be a thing of beauty, simplicity defined, and
powerful, but not to the point of ambiguity. A tool that can be used
in all sorts of environments, for all sorts of applications.

So anyway, in conclusion, we have a lot to think about for EARL 1.0,
if any of us can be bothered. I'm not too sure how enamoured people
really are with the concept of EARL, but it seems to be one of those
unavoidable things. The two things that I've found most annoying is
watching EARL run around in circles, and not having the OOP knowledge
to develop some real tools for EARL.

The Semantic Web thing is annoying too. The Semantic Web is heading
off in odd directions, and I feel that the projects are diverging and
either getting lost, or absorbed by other competition. The SW isn't
some mega killer-ap because it's so empheral, and yet if a few more
really good programmers took the time to develop some well-documented
and stable inference engines, we'd be laughing. That's really all we
need. EARL could use them to tremendous advantage - we could almost
label them as EARL engines. All we'd need to do is write filters and
stuff.

It would be nice if the W3C could install some RDFQL-type database for
us to play around with using EARL. I'm sure we could come up with some
really neat structured queries and so forth that would make for some
really great results. I think you can even produce Web pages and so
forth from that type of stuff.

I don't know how to draw more people into thinking about EARL. Apathy
about stuff that requires work seems great, as Wendy will tell you
after some time in getting people to write/organize WCAG techniques
:-) I don't want EARL to turn into one of those things that you start
discussing with people and they make their excuses and leave; and yet
I fear that has happened on many occasions already. Unfortunately, no
matter what we do, it's always going to be over promotion, or under
promotion.

But perhaps the main worth from EARL will not be in the results of the
experiment, but the method by which we got there. The
interconnectedness of all things means that EARL might already have
had some incredible and wonderful ramifications that we are not aware
of. I've certainly learned a lot from it - I knew very little about
RDF when I started tinkering with EARL, and although I've still got a
lot to learn, at least I have the capability to develop other RDF
based languages and so forth.

But results are nice. The bookmarklet thing is fun, and I can't wait
to use a real EARL tool, and check out how it performs and so on. Oh
for just one person to come across the ERT site and be
super-interested in EARL...

EARL 1.0? Let's do it!

--
Kindest Regards,
Sean B. Palmer
@prefix : <http://webns.net/roughterms/> .
:Sean :hasHomepage <http://purl.org/net/sbp/> .
Received on Saturday, 14 July 2001 03:55:42 UTC