Markup and explicit association: checkpoint 1.3 etc.

To resolve various outstanding issues surrounding checkpoints 1.1,
1.3, 1.6, 4.2 and 4.3 I propose to add a new technical term to the
document and to use it consistently throughout. I am open to
suggestions of what the term should be, my initial proposal is that we
use the phrase "explicit encoding", which should be carefully defined
in the guidelines document and in the glossary.

This move is best justified by describing the problem it is intended
to resolve. Early in the development of WCAG 2.0, it was realized that
the terms "markup" and "markup language" are too restrictive to
characterize all of the various mechanisms by which logical structure
and other information can be represented in a well defined form that
is amenable to automated processing. The main limitation of these
terms is that they can be read as designating certain kinds of
syntactical constructs (e.g., XML or SGML syntax) rather than the more
abstract concept of a well defined data structure that permits
retrieval of the required information. The W3C has recognized this,
hence the development of the XML information set, which separates the
tree structure of an XML document (and the various informational items
it contains) from the standard syntax defined in the XML 1.0
specification. Other relevant examples include metadata, which
although often expressed in markup languages are not part of an XML or
(X)HTML document with which they are associated, and, in non-W3C
technologies, tagged PDF, which is analogous to XML in many respects -
it comprises a tree of elements with associated attributes - but is
represented in an entirely different syntax.

Of course, if content is stored in a data base and retrieved from
there by the user agent, then there may be no markup language at all,
merely, perhaps, an XML information set, or equivalent. Thus I think
we need a technical term that expresses what is conceptually common to
all of the foregoing examples. "Explicit encoding" appears to be
reasonably suggestive of what is meant, though of course its precise
signification would have to be specified in a definition.

The checkpoint 1.3 success criteria would then be rewritten to take
this new term into account, e.g., at level 1 criterion 2:
Each of the following is provided in an explicit encoding

Likewise for checkpoint 1.1 ("Non-text content that can be expressed
in words has a text equivalent associated with it in an explicit
encoding" - or we could separate this into two criteria: first that
there is a text equivalent, and secondly that it is associated with
the non-text content via an explicit encoding).

This terminology would also be employed with respect to acronyms and
abbreviations, concept codes (under checkpoint 4.2, as Lisa has
proposed) and in other suitable contexts. If necessary, instead of
writing simply "explicit encoding" (under checkpoint 1.3 for example)
one could write "explicit encoding, for example a markup language) to
clarify the point for the casual or inattentive reader, while
conformance would still be judged by reference to the technical term
and its corresponding definition.

On the subject of the definition itself, further work is required.
Here is a preliminary attempt, however.

An explicit encoding includes, but is not limited to
1. a markup language
2. metadata
3. an information set.
Information is said to be encoded explicitly if it can be
programmatically derived (by a deterministic algorithm) from the
format or data structure in which it is represented.

Note that the reference to programmatic derivation comes from the
existing wording of checkpoint 1.3. Obviously the definition can, and
ought to be, tightened, but I think the general idea is clear.

Concept codes, on this view, are merely an extension of the idea of an
explicit encoding.

Comments? Suggestions? Counter-proposals?

Received on Thursday, 14 November 2002 21:20:10 UTC