Re: passing negative entailment tests

 That statement was copied (adapted) from the RDF test cases document and
has been in the RIF test doc since the earliest draft, but has never been
discussed by the WG as a whole.

The OWL 2 test document says about negative entailment tests: "... a
conforming entailment checker *should* return False, it *should not *return
Unknown, and it *must not *return True."  RIF says the same thing, in the
paragraph above the one you cite.
The OWL document also notes "While sometimes needed (for example, for
pragmatic reasons), Unknown and Error are not desired responses for valid
inputs."

To me, all of these statements seem equivalent:  i.e. if you determine that
it is not entailed (by reporting false), then you passed; if you report
that you couldn't determine it to be either entailed or not entailed, then
you passed but your result isn't as good as someone who reported false; and
if you determined that it is entailed, then you failed. They also seems
close to your suggestion of  considering the 'unknown' case to be a 'weak
pass.'

That all said, I'm fine with removing, rewording, or expanding on that
sentence.

Also, I think we could add to the document the text from your email about
what we consider to be a proper attempt to determine negative entailment.

Stella



On Thu, Sep 24, 2009 at 9:57 PM, Sandro Hawke <sandro@w3.org> wrote:

>
> I was surprised to read in Test:
>
>   Note that while ideally the RIF consumer would be able to
>   conclusively demonstrate that the conclusion cannot be drawn from the
>   premises, in practice a failure to draw the conclusion after a
>   thorough attempt to do so can be considered a successful outcome.
>
> Is this based on a WG decision I'm forgetting?   If so, I apologize.
>
> My sense right now is that this isn't okay.  To determine a negative
> entailment is hard work; it's not enough to just try and fail to find
> the entailment.  For RIF system, I expect determining a negative
> entailment means (1) using an entailment-search algorithm that is known
> to be complete, and (2) giving it sufficient resources to run until it
> is done.  It's tempting to skimp on either of these, but I think people
> who do it right -- who actually give the answer that (modulo coding
> bugs) is known to be correct -- deserve better marks.
>
> Maybe in test-results-reporting we can allow for a 'nearly-passed' or
> 'weak pass', to give some sort of partial credit.  Really, these folks
> just got lucky.
>
> In OWL 1, a system was supposed to report this as 'undecided'.  That's
> better than failing (deciding, but deciding incorrectly), and probably
> better than not reporting any result, but still not as good as a 'pass'.
>
> I still like that solution.
>
>   -- Sandro
>
>
>
>

Received on Friday, 25 September 2009 18:14:32 UTC