Re: Testable assertion tagging for W3C specifications from Alex Rousskov on 2002-05-30 (www-qa@w3.org from May 2002)

From: Alex Rousskov <rousskov@measurement-factory.com>
Date: Thu, 30 May 2002 08:51:58 -0600 (MDT)
To: Scott Boag/Cambridge/IBM <scott_boag@us.ibm.com>, David Marston/Cambridge/IBM <david_marston@us.ibm.com>
cc: www-qa@w3.org
Message-ID: <Pine.BSF.4.10.10205300800550.31432-100000@measurement-factory.com>
I think we are getting closer to the core disagreement here. It looks
like David assumes that we can (should?) control the spec markup;
perfect markup solves the addressing problem. Scott and I argue that
both good markup and good addressing schemes are needed.

In other words, (and that's the gist of this e-mail): perfect spec
markup surely solves the addressing problem, but we must not assume
that all real markup is or will be good. A good addressing scheme
should work well with perfect and real specs.

Specific responses are inlined below.


On Wed, 29 May 2002 scott_boag@us.ibm.com wrote:

> There is probably need for both.

Yes! My argument is that a good design will not tie markup and
addressing scheme. I suggest that both good markup DTDs (or schemas or
whatever is the right word to use today) are proposed AND that good
addressing techniques are also proposed. A spec author will be able to
mix and match at will or invent new approaches, as needed.

> The abstract XML content can and should have certain areas flagged
> as noteable for testing and as distinct thoughts.

Sure. I am not arguing against better markup. I am arguing against a
close relationship between "better markup" proposed by QAWG and
"better addressing" proposed by QAWG. In other words, better
addressing should not be done through better markup (because strong
markup dependency limits addressing abilities no matter how good that
markup is).

> While some seem to think that some added structure for use by
> testers would be a burden for spec writers, I tend to think of it
> as a helpful part of the specification skeleton, that will help to
> hold it up over the years.

I think that "extra burden" arguments can be dismissed as long as the
"better QA markup" is not mandatory but is simply one more "tool"  
among many available to spec writers.
 
> These are just tools that are being suggested.  At least from my
> viewpoint, their use by spec authors are optional, and any
> presented design should be very lightweight.  But it would be nice
> if a well thought out scheme could be designed and experimented
> with, and, if it works, evolved over the coming years.

I agree, as long as QAWG can keep "QA Tool Collection" 100% optional.


On Wed, 29 May 2002, David Marston/Cambridge/IBM wrote:

> >I think that if I can address any arbitrary piece of a Rec, I can do
> >everything I need.
> 
> Well, sure, if the authors were careful enough to say everything
> they should have said.

By definition, one cannot _cite_ something that was not said. I assume
we are talking about citation scheme here so we are limited to the
specs text. Of course, there may be many test cases that do not cite
the specs and either cite some other "companion documents" or simply
describe the test case without citations. That's out of scope.

> Alex's idea seems to be that we devise a system of pointing down to
> individual characters, treating the whole Rec as a byte stream.

Addressing individual characters of a byte stream should be used only
if necessary. I am not saying that all addresses must use byte
offsets. If specs markup is good enough, whoever creates an address
should use features of the available markup. Good markup helps, but a
good addressing scheme will not depend on the presence of good markup.

> If you need more than one range, cite more than one.

Yes. Again, we can design a citation scheme where two citations is
still a citation; these are minor details.
 
> I say if that's enough for him, then he's better off to cut/paste
> the actual text into the test case or other place where needed.
> Two benefits: immunity to byte shifts in the source, and no need to
> develop the necessary XPointer improvements. Naturally, the worry
> is that your extracted text may diverge from what's normative as
> errata are issued.

Cut-and-pasting is what we do today. That is not enough for my
applications because:
	- we need to auto-detect with high probability when our
	  citations go out of sync with the spec
	- we want to render the specs with a given citation
	  highlighted so that a reader can see broader context
	  and navigate freely
	- we want to provide reverse index where a reader can
	  read the specs and get "one-click" access to 
	  test cases that correspond to the sentences she is 
	  reading
	- we want to create Coverage Map
	
Note that all of the above is possible with cut-and-paste except for
cases where the citation text appears more than once in the document
and you want to cite a specific occurrence.

My current intention is to have an addressing scheme where each
address has two components: context and citation. The context
component will locate the area of the spec that contains the citation.
There MUST be a single match for all context searchers for the address
to be valid. The citation component is the actual text (or markup)
being cited (within given context). The open question is how to
represent context and citation to allow for mathes to be precise but
flexible enough.
 
> We don't want to address arbitrary text ranges. We want to address
> meaningful (and normative) syntactic units. 

Again and again, can you explain the difference?? Addressing arbitrary
spec ranges INCLUDES addressing meaningful syntactic units! Thus, a
scheme that addresses arbitrary ranges should satisfy all your needs.
Can you give an example where addressing arbitrary text ranges is not
sufficient for your needs?

> QAWG, through the Spec Guidelines, would encourage authors to add
> tags that make these syntactic units more evident. The person
> citing should not be given a tool for selecting arbitrary text
> ranges and left to their own devices to produce "good" citations,
> but should rather be given tags that are part of the expressive
> power used by the spec authors to convey their intent.

I suggest that the person siting is given both a good citation tool
and specs with good tags. Since we cannot control the spec format or
author's intent, and since there are old specs, my approach seems to
be much more practical than relying on authors to produce markup
perfect for citing. Not to mention cases where you want to cite
something that the author did not intended to be cited (by author's
mistake or design, does not matter).
 
> >For example, when a user clicks on a test case description, the
> >description may include some readable narrative _and_ precise
> >quote(s) from the Recommendation that, together, give user a good
> >idea on what is being tested (among other details).
> 
> That's our current practice, but the quotes are obtained by cut
> and paste from the normative documents. I can drag my cursor over
> exactly the range of text I want. So can anyone else. But if
> people from different organizations are contributing test cases
> into a common effort, this approach maximizes the ability for
> their different interpretations to clash. Yet the Rec authors know
> (or should know) exactly which sentences (or productions, etc.)
> are supposed to be the governing ones for any situation.

If you are assuming that spec authors can predict or should control
how their document is cited, then there is no citation problem! The
author will simply tag each citation piece with a unique ID. However,
such an assumption does not hold in reality, making author- or
spec-dependent addressing schemes impractical for general use.

> The hard part, which I will state as a requirement because Alex
> asked for requirements, is to express that a test case is only
> testing a particular combination of circumstances. If I have one
> test case of <xsl:number level="single" count="foo"/>, I have a
> combination of both stated and defaulted choices, and I do NOT
> have adequate coverage of level="single" due to the other
> combinations in which it could be a part. I think that area of the
> spec has thin verbiage, so I can't point at all the sentences I
> need, because some aren't even there. If the Rec authors had
> tagging requirements that drove them to consider coverage of the
> combinations, they would better serve the needs of QA and
> developers alike.

This may be a requirement for spec markup. It is clearly not a
requirement for the addressing scheme. Again, good markup helps, but
we should not assume that all markup is good!

Alex.
Received on Thursday, 30 May 2002 10:52:39 UTC