Re: Testable assertion tagging for W3C specifications

To begin with, let me state upfront that I intend to provide this list 
with a more structured email covering that facets of the discussion as 
it's been taking place up to now. I want to do this as I am editor of 
the particular part of the Spec Guidelines document that should cover 
what the discussion is about, and will surely be extended with many 
ideas in this thread next document iteration, but also as it has 
evolved  into something much larger and interesting than I think Scott 
originally intended.

Comments inlined


On Thursday, May 30, 2002, at 04:51 PM, Alex Rousskov wrote:

>
>
> I think we are getting closer to the core disagreement here. It looks
> like David assumes that we can (should?) control the spec markup;
> perfect markup solves the addressing problem. Scott and I argue that
> both good markup and good addressing schemes are needed.
>
>
[dd] I'd agree here. Just as there is a particular document publication 
cycle with its repsective rules, there should be a (or a set of) 
specification production tools (validator/DTD/XML Schema/Test Suite 
frameworks/transforms/linking techniques).

> In other words, (and that's the gist of this e-mail): perfect spec
> markup surely solves the addressing problem, but we must not assume
> that all real markup is or will be good. A good addressing scheme
> should work well with perfect and real specs.
>
[dd] Well, insofar as we can have say on the way the markup you refer to 
above is written and used, I think we can certainly assume that we will 
solve the adressing problem, as we will have though of it when designing 
the Spec authoring schema. If so, the adressing scheme would work with 
all specs written from one point in time and onward.

> Specific responses are inlined below.
>
>
> On Wed, 29 May 2002 scott_boag@us.ibm.com wrote:
>
>> There is probably need for both.
>
> Yes! My argument is that a good design will not tie markup and
> addressing scheme. I suggest that both good markup DTDs (or schemas or
> whatever is the right word to use today) are proposed AND that good
> addressing techniques are also proposed. A spec author will be able to
> mix and match at will or invent new approaches, as needed.
>
>> The abstract XML content can and should have certain areas flagged
>> as noteable for testing and as distinct thoughts.
>
> Sure. I am not arguing against better markup. I am arguing against a
> close relationship between "better markup" proposed by QAWG and
> "better addressing" proposed by QAWG. In other words, better
> addressing should not be done through better markup (because strong
> markup dependency limits addressing abilities no matter how good that
> markup is).
>
[dd] Could you elaborate on this? Why would structured markup be 
counterproductive?

>> While some seem to think that some added structure for use by
>> testers would be a burden for spec writers, I tend to think of it
>> as a helpful part of the specification skeleton, that will help to
>> hold it up over the years.
>
> I think that "extra burden" arguments can be dismissed as long as the
> "better QA markup" is not mandatory but is simply one more "tool"
> among many available to spec writers.
>
[dd] I think "extra burden" arguments could be dismissed or argued 
against in any case, given that the "extra burden implies better 
quality/addressability/what have you" has been clearly demonstrated. I 
think that's where we should focus our attention.

>> These are just tools that are being suggested.  At least from my
>> viewpoint, their use by spec authors are optional, and any
>> presented design should be very lightweight.  But it would be nice
>> if a well thought out scheme could be designed and experimented
>> with, and, if it works, evolved over the coming years.
>
> I agree, as long as QAWG can keep "QA Tool Collection" 100% optional.
>
[dd] The QA WG was formed to aid Working Groups in producing high 
quality and testable specifications. On the question of whether this aid 
will result in optional or mandatory tools, it is at much a question of 
design as of "marketing". If teh QA WG manages, with help from 
interested parties, to come up with a set of tools that are generally 
accepted, I don't think there's an issue. And, finally, making people 
accept a particular set of tools can be as much an issue of education as 
of anything else.
>
> On Wed, 29 May 2002, David Marston/Cambridge/IBM wrote:
>
>>> I think that if I can address any arbitrary piece of a Rec, I can do
>>> everything I need.
>>
>> Well, sure, if the authors were careful enough to say everything
>> they should have said.
>
> By definition, one cannot _cite_ something that was not said. I assume
> we are talking about citation scheme here so we are limited to the
> specs text. Of course, there may be many test cases that do not cite
> the specs and either cite some other "companion documents" or simply
> describe the test case without citations. That's out of scope.
>
>> Alex's idea seems to be that we devise a system of pointing down to
>> individual characters, treating the whole Rec as a byte stream.
>
> Addressing individual characters of a byte stream should be used only
> if necessary. I am not saying that all addresses must use byte
> offsets. If specs markup is good enough, whoever creates an address
> should use features of the available markup. Good markup helps, but a
> good addressing scheme will not depend on the presence of good markup.
>
>> If you need more than one range, cite more than one.
>
> Yes. Again, we can design a citation scheme where two citations is
> still a citation; these are minor details.
>
>> I say if that's enough for him, then he's better off to cut/paste
>> the actual text into the test case or other place where needed.
>> Two benefits: immunity to byte shifts in the source, and no need to
>> develop the necessary XPointer improvements. Naturally, the worry
>> is that your extracted text may diverge from what's normative as
>> errata are issued.
>
> Cut-and-pasting is what we do today. That is not enough for my
> applications because:
> 	- we need to auto-detect with high probability when our
> 	  citations go out of sync with the spec
> 	- we want to render the specs with a given citation
> 	  highlighted so that a reader can see broader context
> 	  and navigate freely
> 	- we want to provide reverse index where a reader can
> 	  read the specs and get "one-click" access to
> 	  test cases that correspond to the sentences she is
> 	  reading
> 	- we want to create Coverage Map
> 	
> Note that all of the above is possible with cut-and-paste except for
> cases where the citation text appears more than once in the document
> and you want to cite a specific occurrence.
>
> My current intention is to have an addressing scheme where each
> address has two components: context and citation. The context
> component will locate the area of the spec that contains the citation.
> There MUST be a single match for all context searchers for the address
> to be valid. The citation component is the actual text (or markup)
> being cited (within given context). The open question is how to
> represent context and citation to allow for mathes to be precise but
> flexible enough.
>
[dd] Wouldn't a precise pointer provide a (local) context, or wouldn't 
that be enough for our needs?

>> We don't want to address arbitrary text ranges. We want to address
>> meaningful (and normative) syntactic units.
>
> Again and again, can you explain the difference?? Addressing arbitrary
> spec ranges INCLUDES addressing meaningful syntactic units! Thus, a
> scheme that addresses arbitrary ranges should satisfy all your needs.
> Can you give an example where addressing arbitrary text ranges is not
> sufficient for your needs?
>
>> QAWG, through the Spec Guidelines, would encourage authors to add
>> tags that make these syntactic units more evident. The person
>> citing should not be given a tool for selecting arbitrary text
>> ranges and left to their own devices to produce "good" citations,
>> but should rather be given tags that are part of the expressive
>> power used by the spec authors to convey their intent.
>
> I suggest that the person siting is given both a good citation tool
> and specs with good tags. Since we cannot control the spec format or
> author's intent, and since there are old specs, my approach seems to
> be much more practical than relying on authors to produce markup
> perfect for citing. Not to mention cases where you want to cite
> something that the author did not intended to be cited (by author's
> mistake or design, does not matter).
>
[dd] I think this is one of the most important issues raised here. Let 
me spell is out (and Alex, correct me if I'm wrong):

1. We cannot control the spec format
2. There are old specs

The issue of adding burden is perfectly valid as fas as old specs are 
concerned. What I whink, and hope that most people agree with, is that 
we should not look at new specs and old specs and think they are that 
similar. Given time, we'll come up with a set of tools that indees give 
control over the spec format. This in turn renders the second point 
above uninteresting for those newer spec. So, in a grey zone we'll have 
older specs with problems and not fitted to the task of being pointed to 
in a uniform manner. The issue here though is to come up with a newer 
format that solves that for new specs and revised versions of fairly new 
ones.

>>> For example, when a user clicks on a test case description, the
>>> description may include some readable narrative _and_ precise
>>> quote(s) from the Recommendation that, together, give user a good
>>> idea on what is being tested (among other details).
>>
>> That's our current practice, but the quotes are obtained by cut
>> and paste from the normative documents. I can drag my cursor over
>> exactly the range of text I want. So can anyone else. But if
>> people from different organizations are contributing test cases
>> into a common effort, this approach maximizes the ability for
>> their different interpretations to clash. Yet the Rec authors know
>> (or should know) exactly which sentences (or productions, etc.)
>> are supposed to be the governing ones for any situation.
>
> If you are assuming that spec authors can predict or should control
> how their document is cited, then there is no citation problem! The
> author will simply tag each citation piece with a unique ID. However,
> such an assumption does not hold in reality, making author- or
> spec-dependent addressing schemes impractical for general use.
>
>> The hard part, which I will state as a requirement because Alex
>> asked for requirements, is to express that a test case is only
>> testing a particular combination of circumstances. If I have one
>> test case of <xsl:number level="single" count="foo"/>, I have a
>> combination of both stated and defaulted choices, and I do NOT
>> have adequate coverage of level="single" due to the other
>> combinations in which it could be a part. I think that area of the
>> spec has thin verbiage, so I can't point at all the sentences I
>> need, because some aren't even there. If the Rec authors had
>> tagging requirements that drove them to consider coverage of the
>> combinations, they would better serve the needs of QA and
>> developers alike.
>
> This may be a requirement for spec markup. It is clearly not a
> requirement for the addressing scheme. Again, good markup helps, but
> we should not assume that all markup is good!
>
[dd] You're absolutely right, and that's why we wan to ensure that it is 
in the future.

> Alex.
>

Received on Thursday, 30 May 2002 15:39:39 UTC