Re: [Fwd: browsable test results] from Sandro Hawke on 2003-09-26 (www-qa@w3.org from September 2003)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 26 Sep 2003 15:58:50 -0400
To: Charles McCathieNevile <charles@w3.org>, "Shadi Abou-Zahra" <shadi@w3.org>
Cc: Dan Connolly <connolly@w3.org>, www-qa@w3.org, WAI ER group <w3c-wai-er-ig@w3.org>, Eric Miller <em@w3.org>, eric@w3.org
Message-Id: <200309261958.h8QJwoSr002845@roke.hawke.org>
Charles McCathieNevile said:
> Well, you've reinvented the axle it goes on...

Ooops.  Sorry, I forgot all about EARL while doing this work.  Well,
mostly it never sunk into my brain that you were positioning EARL for
such broad use; I thought it was just about reporting issues with
content.

Of course my ontology is much simpler and probably somewhat easier to
use.  I wonder when, if ever, it would pay off to for the OWL folk to
use EARL extended as necessary....   I'm very, very happy with the
adoption so far; I wish I knew if a more complex vocabulary would have
hindered adoption.  I guess I can ask the current sources how they
feel about such a change.

Do you have an equivalent of my report generator which would make nice
pages about test results if these folks had used EARL instead?  Or
which would tell us which PROPOSED tests were passed by two systems,
which APPROVED ones were passed by none, etc?

> Your RDF models a lot less than EARL [1], and not much more (it does include
> a specific property for the test duration, whereas EARL includes a general
> comment property...)

I don't see documentation for "cannotTell", so maybe it covers this,
but one thing we need is to report when the system being tested failed
to give any answer.  This is different from giving the right answer
(Passing) or giving the wrong answer (Failing).  I called this
"Incomplete" originally, but I'm now changing it to "Undecided".
(That term is weak in that it suggests that the TESTER couldn't decide
if the tested system passed or failed (which is probably what
"cannotTell" means), but it matches many decades of decidability
research.)  I'm not sure how the term could apply to content or even
user agents.

Why not just use rdfs:Comment instead of earl:message?

> Other simple differences are not as many types of result,

Yeah, I figured "notTested" would just be no data, and if the test was
notApplicable, then why would it be tested?   That is, I don't think
those actually are test results.

> the fact that there
> is no RDF description of things being tested (earl defines webContent and I
> think userAgent),

That would be in an ontology of Tested-Things (which I think of as
"Systems", but you need something broader to cover content), which I
don't need yet.  Actually, I've worked on it a lot for OWL, but it's
totally orthogonal.  If you include it in the test ontology, people
may well think that's the only kind of thing the test ontology is for
(like I did with EARL).

> or the way that they were tested (automatically, manually,
> or heurisitically - i.e. deriving a conformance result from other conformance
> results).

I'm not sure what those terms would mean for OWL.   

BTW, I think you're using the word "heuristic" when you mean
(and say in the explanation) "derived", or perhaps "implied",
"inferred", or "deduced".   For a test to be completed heuristically
would, I think, mean that you guessed what the results would be and
then verified that that's what they actually were.   That's probably
not what you meant.

> The important difference is that you have no notion of provenance in your
> model. Provenance is built into the EARL model as a protection against
> conflicting claims (which would cause your stuff to just have a
> contradiction) or to enable choosing how to deal with conflicting claims
> (trust management). 

Indeed.  Provenance is also orthogonal and does not belong in a
test-results ontology, although I was tempted, too.

For my application, I don't need provencance; the submitters are all
essentially trusted.  Meanwhile, if I had an implemented system for
handling provenance, I'd want to use it for a lot more than just test
results.

Shadi Abou-Zahra said:
> furthermore, in EARL the assertor that conducts the tests is included so
> that a single report of test results could be a collection of tests
> conducted by different sources.

My approach doesn't prevent merging results -- in fact I dump all the
results into one triple store before doing anything else -- it just
doesn't yet keep track of what came from where.  Yes, if we want to
track the source of each fact and allow reporting of results from
multiple sources in one file, then we'll need some kind of provenance
vocabulary.  But, again, that vocabulary should be outside of EARL,
since it's needed in so many other cases; it's been the focus of an
enormous amount of work and is very hard to get right.   (I'm not
complaining about EARL going their own way on this, if they needed
something simpler and sooner, but I'm very reluctant to use it
myself when I'm working so hard on the more general approach.)

    -- sandro
Received on Friday, 26 September 2003 15:58:35 UTC