RE: Testability and normative requirements from Mark W. Skall on 2006-04-17 (www-qa@w3.org from April 2006)

From: Mark W. Skall <mark.skall@nist.gov>
Date: Mon, 17 Apr 2006 16:24:17 -0400
To: "Karl Dubost" <karl@w3.org>, <www-qa@w3.org>
Message-ID: <60DE4C815920CA41AF6CC5CFDA9CC8490276F2BA@WSXG03.campus.nist.gov>
> RFC 2119 uses:
> 	- absolute requirement (MUST, REQUIRED, SHALL)
> 	- absolute prohibition (MUST NOT, SHALL NOT)
> 	- particular item      (SHOULD, RECOMMENDED)
>          - particular behaviour (SHOULD NOT, NOT RECOMMENDED)
>          - item                 (MAY)
> 
> which shows btw that RFC 2119 is not really consistent. I remember
> that at the WWW2002 Conference, Mark Skall had presented a paper
> about the problems of RFC 2119.
> 
> Mark, do you still have this paper and could you send the text in a
> mail on this list?

Hi Karl,

Actually it was 2004, not 2002, and it wasn't really a paper but just an
oral presentation.  In any case, here are my rough notes that I used to
make the presentation.  Looking forward to seeing you at the upcoming AC
meeting.

Mark



We all know that interoperability is much easier to talk about than to
obtain.

Many reasons for this:

1.	Variability in the specs - In the old days, we had a standard.
Standard meant everyone used it and portability/interoperabilty was
achieved (unless extensions were included). But, you were forewarned
that if you used the extensions in an implementation,
portability/interoperability would not be obtained.  Today, specs/stds
have become very large and attempt to "do everything for everybody"
a.	As a result, profiles are needed.  Profiles are "slices" of the
standard geared to a specific constituency - Right away, this restricts
interoperability to the community using the profile 
i.	There may be a proliferation of profiles
ii.	What if have a slightly different requirements but most of the
requirements for a different constituency are the same?  You can define
a new profile, which is similar, but different in some ways from the
previous one OR you can try to use the existing one - this may enhance
interoperability but may not really meet the needs of that constituency.
b.	Sometimes profiles, themselves, are too large or do not exactly
meet specific needs - then trading partner agreements may be developed.
These are typically a bi-lateral agreements - agreements between 2
parties.  Now we really have a proliferation of many agreements -
interoperabilty is only between 2 implementations.  These bi-lateral
agreements are very common, for instance, in the healthcare arena, where
HL7 messages are implemented.

2.	People are confused wrt to requirements - RFC 2119

a.	Best current practices for the Internet community
b.	"In many standards track documents several words are used to
signify the requirements in the specification.    These words are often
capitalized.  This document defines these words as they should be
interpreted. - "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL"."  
c.	Let's look at these key words, one at a time.


1. MUST   "This word, or the terms "REQUIRED" or "SHALL", mean that the
definition is an absolute requirement of the specification."
	
"MUST" I understand - When I was a kid and my mother said you'd better
do this or else, that meant "MUST". You'd better" clean your room or
else.  That was a "MUST". MUST" implies consequences if you don't
"conform" - that's the "or else." On occasion the "or else" resulted in
a shoe being thrown at me.  MUST" I understand 

	

2. "MUST NOT   This phrase, or the phrase "SHALL NOT", mean that   the
definition is an absolute prohibition of the specification."

This I understand as well.  You'd better not drink and drive.  That's a
"MUST NOT" with consequences, not only from my mother, but from society
and the penal system.

3. SHOULD   "This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course."

This I don't understand. What are the valid reasons to ignore a
recommendation?  What if I weigh the implications of ignoring this
recommendation - how do I value the benefit I receive vs. the
implications of ignoring the request?  When I was a kid and my mother
said "you should really behave better" I completely ignored her AND
THERE WERE NO CONSEQUENCES!  It all comes back to that - no
consequences, no adherence.

4. SHOULD NOT   "This phrase, or the phrase "NOT RECOMMENDED" mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed before
implementing any behavior described with this label."

Again, what does this mean?  This is kind of like the opposite of the
last one but this time it's the implications of doing something (which
I'm urged to not do) that I have to evaluate vs. the benefit of not
doing the action.


5. MAY   "This word, or the adjective "OPTIONAL", mean that an item is
truly optional (as opposed to the other keywords which were not truly
true?).  One vendor may choose to include the item because a particular
marketplace requires it or because the vendor feels that it enhances the
product while another vendor may omit the same item. An implementation
which does not include a particular option MUST be prepared to
interoperate with another implementation which does include the option,
though perhaps with reduced functionality. In the same vein an
implementation which does include a particular option MUST be prepared
to interoperate with another implementation which does not include the
option (except, of course, for the feature the option provides.)"



Okay, now my brain hurts.  First of all, what does "MUST be prepared"
mean? I've already figured out what MUST means but MUST BE PREPARED?
Sounds like the boy scouts?  This is not a measurable requirement.  

And I REALLY don't understand the whole concept.  Isn't "MAY" the same
as "MAY NOT"?  And, in fact, isn't this the same as saying nothing about
this requirement?  If a parent gives their child the option by saying
"you may" go to the party, he/she has complete discretion about whether
or not to go.  The only thing the child knows is that he's not
prohibited from going.  Thus "MAY" is only useful in knowing that it is
"not "MUST NOT"." 

What's the Solution?


At NIST, we preach the need for what we call "early intervention" in the
software development process.  That means, much care in the development
of precise, unambiguous requirements in a specification and testing,
starting early, to ensure conformance to the spec. 

However, in the real world, this rarely happens.  A lot of software is
developed without any specs and often times, the specs that are
developed aren't very good.  


W3C is an organization that's developing very important specifications
that a large number of people are implementing.  These specifications
are, I believe, well written.  Much care goes into writing them and they
are vetted by a large number of organizations.  Additionally,
comprehensive test suites are developed.  I can personally vouch for the
high quality of the tests because NIST is the testing leader for many of
the W3C recommendations and has written a large number of tests.  These
tests have 2 purposes:

1)	the obvious purpose of determining implementer conformance to
the specs and 
2)	a feedback loop, when errors or ambiguities are found in the
spec, back to the spec developers.  This results in improved
implementations and improved specs.  The key to achieving both of the
above is the early development of tests.

In order to have high quality, interoperable software you have to start
early!  A recent study at NIST reported that the annual cost to the
nation, because of inadequate software quality (caused primarily by
inadequate testing) is $59.5 billion.  Report also found that only 3.5%
of errors are found during the requirements and design phase and that it
costs 2 to 5 times more to fix errors downstream in the process.


1) Since the tests are developed as early as possible, they are
used as exit criteria for moving from one phase to the next (from CR to
proposed rec). They're used to demonstrate that all the features were
correctly implemented, by 2 interoperable implementations. If features
aren't supported by imps, sometimes features are refined or removed form
the spec.


2) The other advantage of having tests developed early is that
implementers get to use these tests early on to debug their
implementation long before the implementation gets to the marketplace,
resulting in higher quality software that is also less expensive since
early detection of errors is much less costly than later detection.  In
fact, we have some anecdotal evidence showing that in some cases,
implementations passed around 35% of our tests at the beginning, but
that after finding and fixing the bugs were able to pass 95-100% of our
tests.  So this is a dramatic improvement in software quality.


The two needs I described above 

1.	Better, clearer, more precise and less ambiguous specs and
2.	The need or comprehensive tests suites early in the process

Are also being addressed by the QA Activity within W3C.  NIST is proud
to have hosted the initial workshop to investigate the need for this
group is one of the leaders of the activity.   The QA Activity is
developing guidance for the other WGs in W3C (and ultimately for anyone
writing specs) on how to write better specs and how to develop better
test suites.

So in summary, much work needs to be done to obtain seamless
interoperability.  The community needs better specs with better ways of
specifying requirements. Perhaps we should just eliminate all
requirements that aren't MUST.  But even more importantly much more care
and feeding needs to go into spec development and testing.  The W3C is
doing a good job, I believe, in promulgating these good practices, but
this responsibility is shared among all software developers if we have
even an outside chance of producing reliable software.

	


		
	



   
 
Discussion plan: Start by distinguishing between the illusion and
reality of interoperability; competing products claim to implement the
same standards but exhibit different behavior. Ways to respond: accept
the differences, require tests to enforce the specs, reduce variability
in the specs, use marketplace forces such as negative reviews in the
trade press. Bring up bad experiences associated with each kind of
response. Then, depending on audience interest, pick other issues from
prepared list [which currently contains:] 
How is Web interop faring in 2003? 
Are the W3C Recs suitably constraining? 
Do SHOULD statements in the Rec have enough moral authority? 
Are localization and interop opposed to each other? 
When the WGs issue tests as well as Recs, how do we benefit? 
Does rapid issuance of new standards and Recs help or hinder achievement
of the interop goals? 
How do WG volunteers balance interests of their employers against those
of the consortium?
t Practice

1. Aren't proprietary end-to-end systems better because the vendor takes
responsibility for actually making it work? (Raised by WWW2004 Panels
Committee) 
2. Does interop arrive too slowly via the standards process? (Raised by
WWW2004 Panels Committee) 
3. How is Web interop faring in 2004? (Illusion vs. reality) 
4. Are the W3C Recs suitably constraining? 
5. When the WGs issue tests as well as Recs, how do we benefit? 
6. Do SHOULD statements in the Rec have enough moral authority? 
7. Are localization and interop opposed to each other? 
8. Does rapid issuance of new standards and Recs help or hinder
achievement of the interop goals? 
9. How do WG volunteers balance interests of their employers against
those of the consortium? 
10. If we have standards, why does product behavior vary so much?
[Page 3]
Received on Monday, 17 April 2006 23:14:27 UTC