BCP DTDs would get used

Recently there was an incident where I complained about something on the 
XML Conference website.

  http://lists.w3.org/Archives/Public/www-validator/2002Nov/0127.html

Now, I do concur with Bjoern's analysis that technically a value of
html:a.name of '#wai' is legal per the HTML 4.01 Transitional DTD.

But for webmasters, this is operationally the wrong answer.  The answer to
the wrong question.  And the answer to the right question is readily
generated by the validator tool, if we just made the appropriate DTD
available as an option.

The webmaster in question, presented with the evidence in my complaint,
immediately fixed the problem.  Using '#wai' as the html:a.name token is a
HCI usability disaster; and the sense of the HTML WG is that the things that
can be used in the #fragment clause in a URI-reference to a HTML document
(XHTML included) should be restricted to the Name production.  This is a
change from CDATA as it had been up until in HTML 4.01 for html:a.name.  See
XHTML 1.0 for where this reform was instituted, I believe.

There are human factors and de_facto interoperability (see evidence about
Netscape and Lynx results) reasons why those webmasters who will take the
trouble to exercise a syntax check against their content should check
instances of html:a.name against the requirements of the NAME production and
not pass general CDATA.  Even if they want to limit their feature set to
what is processed similarly-enough across many browsers including NN4.

There are very few downsides to enforcing this more restrictive syntax on
html:a.name and it is as easy to check your hypertext for the criteria
on the safe side as it is to check for the per the letter of the spec
profile.  The actual industrial processes separate the activities of
webmasters in "scrubbing content before it goes live" from the activities
of browser makers in maintaining and publishing user agents.  These two
communities need to know what are reasonable emit and accept filters.  Taking
a hard line for the principle that these two filters ought to be the same
thing is not listening to our customers.  It flies in the face of the
learnings of the quality revolution.

This is a matter of applying the so-called DRUMS principle of being strict
in what you emit and lax in what you accept.  In the present case we are
dealing with #fragment tokens that do get interpreted and created by
people a certain amount in the overall information flow of the Web, so the
human confusion that arises from allowing an initial '#' character in this
token are germane.  This string is not just for machines to process.

Where I am going is that the W3C should consider maintaining (as a living
document) one or more Best Current Practice DTD as an alternate to the one
which exactly agrees with the specification as published in the past.  I do
not believe that this needs errata to the specification.  The specification
can be left as is an the BCP checking profile can still be adjusted to avoid
known current interoperability problems in the field, as has been done with
the extraction of the core of CSS features when there were a lot of
interoperability problems with CSS.

I am posting this idea here and not to the validator list because it is a
question of W3C attitudes toward specifications, not just the mechanical
maintenance of the validator itself.  In an ideal world W3C would be fully
committed to measuring the quality of its impact, and the QA group would
get all the same visibility and participation as the TAG.  But we aren't there
yet.

Somehow we have to get the W3C enterprise to take a more empirical approach
to characterizing the level of performance of interoperability, and perhaps
more importantly what it is that succeeds in interoperating, and not just
trying to define an emit filter and an accept filter by the same utterance.
The validator technology would be used much more by webmasters if we were
publishing a BCP DTD that was a rough consensus estimate of what is safe to
assume will actually interoperate successfully with a variety of browsers.
This is not a normative statement, it is a descriptive consolidation of
workarounds for known current problems in the continuity of operation of the
web-in-the-large.  Or at least I claim that if we went to the WWF or took a
scientific sampling of webmasters, that is the answer that would come back.

To organize an effective campaign to promote the use of orthodox markup, we
need something much like an "accept filter" that is on the loose side of the
spec.  This is a rough envelope of the markup practices present in the
content that a typical browser will need to cope with in order for its user
not to reject it as useless.  We need a catalog of the differences between
actual common practice and W3C writ in order to do the necessary triage as
to where we should fight and where we should switch; as well as organizing
our business case for why they should switch where we chose to fight.

I am not sure that the browser makers would consense that they want to share
enough information to build the 'accept' model.  That is something that the
HTML WG probably has the right participation to answer.

But I do suspect that the webmasters of the world would be delighted to
share problem experience and work to extract the core of what mostly works
into a conservative "emit filter."  I should qualify 'conservative' in that
regard.  This is what was discussed above, that which is largely safe in the
hands of the existing user agents, expressed as a profile of markup
utilization.

And this has to "eliminate the middleman" of W3C writ.  It has to name names
as to language constructs and processors.  I don't mean that we shouldn't be
checking and tracking what the specifications say with regard to trouble
incidents.  But the binary relationship between content example and
processor example is sufficent to get started, and must be preserved somehow
in what is available to the customers of this collection.

Al

Received on Friday, 29 November 2002 10:01:28 UTC