Terminology and editorial Comments on WCAG 2.0 Last Call Draft


Commenter: Al Gilman    

Email: Alfred.S.Gilman@IEEE.org    

Affiliation: W3C Invited Expert

Date: see transmittal email


Directions

Please ensure that the comments submitted are as complete and "resolvable" as possible. Thank you.

1.
Document Abbv. (W2/UW/TD)

2.
Item Number (e.g. 1.1)

3.
Part of Item (Heading)
4.
Comment Type (G/T/E/Q)
5.
Comment
(Including rationale for any proposed change)
6.
Proposed Change (Be specific)
 W2 Glossary  web unit  G/E   One can reasonably interpret what the Web Characterization Terminology meant by "simultaneously" to mean "concurrently."  The point is that your concept is their concept, you are just straining at gnats over the term 'simultaneously' as if it implies 'instantaneously.'

You just don't know how much street cred you lose by using funny-money terms like "web unit" when what you mean is what the web designer means by a "web page."
Use "web page."

State that the concept is essentially the same as in the Web Characterization Terminology.

Add something on the order of "Owing to the increasingly dynamic nature of web pages today, one would be more likely to say 'rendered concurrently' rather than 'rendered simultaneously' so people don't think that there has to be an instant rendering of a static page.  The requirement is that fluctuations in the page view take place in a context which is stable enough so that the user's perception is that they are in the same place. 
W2 1.3.1,
Glossary
content G/T/E the concepts of 'content' and 'presentation' as used here are indadequate to explain the needs of users with disabilities even with today's technology.


See "Content v. Presentation" below.
Apply the suggestions under "two interfaces" comments, to wit:
Articulate requirements a) against the rendered content as it contributes directly to the user experience b) against the content as communicated between the server and the user agent -- the formal model created by the format and exposed by the document object that results from the "clean parse (see IBM suggestions there).

Enumerate information requirements that have to be satisfied by the formal model in terms of questions that the content object must be able to answer in response to a predictable query (programmatically determined requirement).  Most of these are context-adapted versions of "Where am I?  What is _there_? and What can I do?"
 W2 throughout  criterion v. criteria  criterion is the singular of criteria.  criteria is a plural noun.  use 'criterion' where singular is meant. 
           
           
           
           
           
           
           

Content v. Presentation


Since this is one of the holy cows of accessibility, and I at least feel that this explanation is fundamentally problematical, I think this is a fertile area for discussion.

Hypothetical dialog:

Joe:

"rendering" refers to the rendered content, like an "artist's rendering" of a planned building. This includes everything, the glyphs for making writing evident (perceivable or recognizable) and all the CSS-added effects such as font face, size, bold, and color that the authors mean to imply by 'presentation.'

Moe:

No, the use of 'rendering' here doesn't mean the as-rendered result, it refers to the process of generating that result.

What's wrong with this story?


Preliminaries in terms of terminology

Both "presentation" and "rendering" are used in plain English to refer to either the activity that transforms the content or the resulting representation of the content at the User Interface. This ambiguity could be resolved by more surgical word-smithing.

more serious

The transformation is entirely determined by the technology choices of the author. So if we say 'presentation' is defined by the function of this transformation, we are at the whim of the encoding used between server and User Agent, and we haven't really isolated a genuine rhetorical or semantic distinction.

If we go with "the process that generates the final representation at the User Interface" we find that the division between content and presentation is determined by the author's choice of file format and the implied client-side transformation to the pixel plane or audio out channel.

To make this clear, consider that if we define the difference between presentation and content by the rendering transform, the text-in-images is image content. At least the way PF has been approaching things, this is an erroneous model. Because the text in the image will be _recognized_ by the general user as being a sample of written language, or some sort of code using the writing system of a written language, that language or symbology defines a re-purposable baseline for adapting the look and feel of the user experience to get within the capture regime of the user's sensation, perception, and conception.

* most serious:

The distinction is not only not defined by this language in the document, it is not definable in any stable and fair way.

Starting at the sensible user interface, we can articulate different deconstructions, or "content hypotheses" for what is presented to the user. In the case of textual content, that's not a big problem, the Unicode character set provides a good level of abstraction that

a) supports client-side personalization of the rendered form of the writing, and b) is readily verifiable by the author as "yes, that's what I said."

The problem is that a lot of the connotations that are communicated by arrangement and other articulable features in the scene presented to the user are much less readily articulated or validated by the author. And depending on the media-critical skill and habitual vocabulary of the knowledge engineer doing the deconstruction, you will get rather different information graphs as an articulation of the 'content' of the scene.

Contemporary web page design is more poster design that essay composition. And it is more interactive than what was supported in HTML1 or HTML2. It is the design of a richly decorated and annotated button-board -- a collection of clickables which the site and author think would appeal to the visitor based on what they know from the dialog so far. But that doesn't guarantee a lot of relationship between the different fragments mashed together into the page. If the page does focus on one coherent story, the page will list higher in search results. But that hasn't taken over practice in the field yet.

So the information that is implicit in the [complete, including glyphs, etc.] rendered form of the content is a mix of that which is readily articulable, and the author will readily recognize as articulable, and other information that at first seems ineffable until a skilled media critic analyzes it and presents an analysis to the author or designer; only then can they recognize that there are articulable properties about their stuff. If the analyst is working from an information model of properties and relationships that are known to survive re-purposing and re-enforce usability in the re-purposed rendering, then this process of backing up appearances (including sonic appearances if included) with an articulable model will in fact enable better access, better usability in the adapted experience. And it will also improve usability on the whole in the un-adapted user experience, but more weakly. There will be fewer task failures for the nominal user if the model underpinnings are not provided than for the user who has to depend on an adapted user experience, an off-nominal look and feel.

** so where do we go? how to do it right?

My latest attempt at a summary is the presentation I did at the Plenary Day on 1 March.

<quote
cite="http://www.w3.org/2006/03/01-Gilman/tree2.xhtml">

afford functional and usable adapted views
         function: orientation --
           Where am I?
           What is there?
           What can I do?
         function: actuation --
           from keyboard and from API
           navigation: move "Where am I?
         performance: usable --
           low task failure rate confirms access to action, orientation
           reasonable task-completion time confirms structure,
             orientation, navigation

</quote>

What gets into our 'content' model, where we ask authors to encode and share the answers to selected questions, is a modeling decision driven by what we *know* about the functional differences between PWD and nominal use of the User Interface.

In other words, we know that the "current context" -- the user's gut understanding of the extent of the answer to "where am I?" -- that you can reliably keep in foreground memory on the fly in a Text-To-Speech readout of a page is a smaller neighborhood than what the visual user perceives as the context that they are operating in. This is why there has to be information that supports the system prompting the user about where they are and where they can go -- navigation assitance -- *inside* the page, more consistently and at a finer grain than the full-vision user needs.

The screen reader user without Braille has more local neighborhoods that have to be explainable in terms of "where am I?" and hence more levels of "where am I?" answers in the decomposition of the page. Most of those levels or groupings are evident in the visual presentation, but we have to coach the author in articulating, labeling, and encoding the industrial-strength explanation of page structure beyond what is enforced by better/worse differences in user experience under nominal use conditions.

This effect is readily understood in the area of orienting for "what can I do?" This requires good labeling for hyperlinks, form controls, and other presented objects that invite user action. This is pretty well set out in

Both WCAG1 and WCAG2.

The model of what information to ask for as regards intra-page structure is less well resolved in the community (there is less endemic agreement on what this information is). We have hopes for the "time to reach" metric analysis used in ADesigner (from IBM Japan) as a way to communicate when and where they need to improve the markup of intrapage structure.

The bottom line is that we should stop trying to generate disability-blind specifics at a level as low as the Success Criteria without getting more specific about known disability-specific adaptations in the look and feel of the web browse experience. Analyze the usability drivers in the adapted look-and-feel situations, and then harmonize the content model across multiple adaptations of look and feel including the null adaptation.

[one principle I missed in the Tech Plenary presentation:]
-- the first principle, that I did brief, is to:

- enable enough personalization so that the user can achieve a functional user experience (function and performance as described at Plenary)

-- the second principle that we have talked about with DI but did not discuss in the brief pitch at Plenary is to:

- enable the user to achieve this with a little perturbation in the look and feel as possible; in other words the equivalence between the adapted and unadapted user experiences should be recognizable at as low a level as possible. Using adaptations (equivalent facilitation) which require a high level of abstraction (task level) to recognize their equivalence is both necessary and OK if there is no less invasive way to afford a functional user experience. But most of the time there's an easier way, and while what is hard should be possible, what is easy should indeed be easy.