Final Minutes of 05-January-2004 Teleconference

QA Working Group Teleconference
Monday, 05-January-2004
Scribe: Mark Skall


(DH) Dominique HazaŽl-Massieux (W3C)
(LH) Lofton Henderson (CGMO - WG co-chair)
(LR) Lynne Rosenthal (NIST - IG co-chair)
(MS) Mark Skall (NIST)


(KD) Karl Dubost (W3C, WG co-chair)
(PC) Patrick Curran (Sun Microsystems)
(SM) Sandra Martinez (NIST)
(VV) Vanitha Venkatraman (Sun Microsystems)
(dd) Dimitris Dimitriadis (Ontologicon)


(MC) Martin Chamberlain (Microsoft)
(AT) Andrew Thackrah (Open Group)


(DM) David Marston
(SH) Sandro Hawke

Summary of New Action Items:

No new action items

Previous Telcon Minutes:


1.) roll call 11am EDT

2.) Any routine business?

LH  Next week will have Monday (Test Assertions) and Wednesday (TestGL 
draft) telcons
MS  Will we be addressing Jeremy Carroll’s comments?
LH  We will need to address this  probably at the end of the month.  I will 
send mail suggesting a particular day for this topic.

3.) Presentation by Sandro Hawke of the work done by the OWL
WG and the RDF WG for the development of their test materials
and the gathering of the tests results. A few references:
- OWL Test repository:
- OWL Test Cases document:
- OWL Test Results:
- RDF Test Cases:
- RDF Test Results:

The following are Sandro’s notes he e-mailed prior to the telcon:

1  Introduction

     This is Mostly a Story about a Web Page

     Prolog: A Test-Driven Implementation of OWL ("Surnia")

     Effects the page has had

     What's next?

     About Me
         [ At home today (a bit of a cold, you may hear kids) ]
         - W3C Team, Semantic Web Activity, DARPA/Research Funding
         - Joined WebOnt (OWL) in June

2  Surnia

     - Immediate motivate: demonstrating implementability of OWL Full
     - I Didn't Read The Spec!  I just implemented based on my rough idea,
       then modified to pass more and more tests.
     - Results were decent
     - Also for RDF Core entailment & non-entailment tests

3  The Test Results Page

     3.1 Test Results Ontology (for reporting results in RDF)
         - TestRun (PassingRun, FailingRun,  IncompleteRun / UndecidedRun)
            + which test
            + where is the output (details of the run)
            + which system was tested
              (NOT tracking Project/Release/Platform/etc)
            + time test began
            + test duration

     3.2 Test Results Page -- First Version
         - big table of tests/systems
         - regen at bottom
         - self-explanitory

     3.3 Feeds
         - give me the URL of your test data
         - results page let people try it themselves, to do some debugging
         - some people advertised their results telling people to do a custom
           version of the page!
         - various bugs in their systems and a few in mine

     3.4 Test Results Refinements
         - group tests
         - leave out systems with no results for a group
         - summary table at top
         - group by number-passed/failed

4  Effects?

     4.1 Working Group
         - helped us decide which test to look at more closely, approve,
           move to extra-credit
         - let us push implementors on key tests

     4.2 Implementors
         - could see who was doing what, get some publicity
           (not enough system data per test, though, IMHO -- output link)

     4.3 W3C Members and Director
         - for advancement to PR -- Director liked it; I hope the members do

5  What's next?

     5.1 Test Suite stable now, as we're at PR
         - non-normative tests from WG?
         - tests from other submitters?

     5.2 Page usability
         - smaller bits!   searchable!
         - better info/links about implementations

     5.3 More info on page
         - benchmarking
         - change-over-time
         - nuances of test results (use of "output" link?)
         - facilitating *discussion* of tests

Sandro’s notes end here.

The following link is Sandro’s description of ontology test results:

SH  I Work for W3C with DARPA funding and am a member of the web ontology 
Working Group.  First thing I wanted to do was to implement the system. 
First implemented OWL Full without reading the spec and ran my 
implementation against the test suite (the test suite is normative).  Then 
modified the code until a large number of the tests were passed.

Also ran against RDF core tests and modified code to pass them as well.

Other implementations were run against the tests.

Put tests in RDF and created test results ontology for reporting results in 
RDF.  For each test run, there is a url for each test, output and which 
system is being tested (no detail on versions).  Also, asked for the time 
it takes to run a test.  However, WG has not defined performance 
requirements.  Turned output into html.  Results of Surnia and old surnia 
were displayed.

LH  What does “incomplete” mean?

SH  It means that some form of the tests didn’t finish  same as undecided.
MS  What does “undecided” mean with respect to conformance?
SH  It depends on the test.
MS  Is it documented what “undecided” means especially since tests are 
SH  For interoperability, we considered them as “fails.”
MS  What is “extra credit”?
SH  These tests were not expected to pass.
MS  Are these requirements?  Are they “MUSTS”?
SH  What should pass depends on the type of system being built.
LH  Extra credit tests seem to be normative.  What does extra credit tests 
mean with respect to conformance?
SH  We’re interested in conformance of documents, not systems.
LH  There are no conformance requirements on implementations?
SH  There are on some.
SH  Section 4  Effects.  Looked at tests not being passed by a lot of 
systems.  Is there something wrong with these tests?  If important tests 
did not have enough passes, WG sent e-mail to try to get these tests 
passed.  For the move to Proposed Rec (where you need to demonstrate 
interoperability), we used this to come up with statistics.
LR  Did you need 2 implementations that did everything?
SH  They had to be 2 out of a group  need a certain fraction that passed 
tests for OWL and OWL Light.
LH  Test cases do not constitute a conformance test suite for OWL and 
interoperability is in terms of test cases. Aren’t there holes (no test 
cases for normative content in OWL)?
SH  You can’t test every possible combination of protocol messages.
MS  But this isn’t a combination.  However, shouldn’t every requirement be 
tested at least once?
SH  These requirements missing were combinations.  Everything is tested at 
a simple level.  Froze test suite when OWL went to PR.
LH  Can you give a reference to a description of test results ontology?
SH  Will put it in IRC.
DH - How does ontology relate to Earl?
SH  There’s a lot of overlap.  Did it independently  could have used EARL 
but it would have been awkward.
LH  In terms of how QAWG uses the term “normative”, in what sense is a test 
case itself “normative”?  Normative prescribes required behavior.
SH  “Undecided” comes in to play.  “Normativity” demonstrates what‘s 
entailed or not.
MS  Does “normativity” add additional requirements to what’s in the spec?
SM  In theory, just illustrates the requirements, doesn’t add to it.
MS  So what does it add to call tests “normative”?
SH  Failing a normative test means you don’t conform.
MS  So aren’t all test cases normative?
SH  Most WGs don’t put their test cases on the Rec track.
DH  You could have an informative test case on the Rec track.
LH  What would happen if there was a contradiction between 2 normative 
things (test cases vs. Rec)?
SH  Text at top says that test cases are subsidiary to Rec.  However, 
conflict would be a cause for concern.
LH  Should test cases be written to define rules or after rules are written?
SH  Test cases are easier to understand (and figure out if they’re right) 
than semantic documents.
DH  Spec has a lot more semantics than test itself (e.g., conformance 
statement and prose). Test cases should be conformance example but spec 
should be first priority.  Test cases are harder to understand.
SH  In these cases, test cases are easier to understand
DH  Because, in this case, you understand the technology.

Adjourn at 1205.

Note: The QA Working Group would like to express our thanks to Sandro for 
the informative and thought provoking presentation.

Mark Skall
Chief, Software Diagnostics and Conformance Testing Division
Information Technology Laboratory
National Institute of Standards and Technology (NIST)
100 Bureau Drive, Stop 8970
Gaithersburg, MD 20899-8970

Voice: 301-975-3262
Fax:   301-590-9174

Received on Monday, 12 January 2004 14:17:33 UTC