Re: lynx-dev User Agent from David Woolley on 1999-08-29 (w3c-wai-ig@w3.org from July to September 1999)

From: David Woolley <david@djwhome.demon.co.uk>
Date: Sun, 29 Aug 1999 19:08:14 +0100 (BST)
To: cynthia.waddell@ci.sj.ca.us (Waddell Cynthia)
Cc: lynx-dev@sig.net, w3c-wai-ig@w3.org
Message-Id: <199908291808.TAA13212@djwhome.demon.co.uk>
Apologies for this size of this; while not directly about Lynx it does
have a lot about the world in which Lynx has to survive.  The original
subject relates to the need to forge the browser identity before some
sites will talk to Lynx.

> Clinton and is found at http://www.aasa.dshs.wa.gov/access/waddell.htm. 

I note that both your email (more details in private correspondence)
and the web document are bloated, causing accessibility problems for
people for whom bandwidth costs money and that the email uses deprecated
HTML and the web document uses undefined Unicode characters, e.g.:

>  </span>As suggested by O&#146;Reilly, an area for future study is &#147;where
                           ^^^^^^                                    ^^^^^^
 
An accessible document, in English, should restrict itself to ASCII
(&#32; to &#126;) as far as possible, and failing that to the ISO 8859/1
subset of Unicode, which also includes &#160; to &#255;.  Anything in the
range &#127; to &#159; is illegal in HTML, when entered as an entity,
although may be used in a particular transfer character set to represent
a Unicode character outside of that range, but such use will cause 
accessibility problems, itself.  The correct Unicode characers for what was
intended here probably wouldn't be recognized by Netscape.

Also the prologue to the HTML looks as though it uses a lot of features
that are specific to a proprietory word processor and appears to use
HTML comments for what should presumably either be SGML processing 
instructions, SGML conditional sections or some extension to HMTL.

<!--[if VML]><![if !VMLRender]><object id=VMLRender classid="CLSID:10072CEC-8CC1
  ^^   The rest may be valid SGML, however I suspect a valid SGML parser
       would treat the rest as comments.

It doesn't have an SGML doctype line and is not HTML 2.0, so it is 
invalid HTML.  It looks like it is really some form of XML aware
extension to HTML 4.0, but it doesn't have an XML processing instruction
line either.

The heavy use of styles suggest that the intended effect is page description,
rather than content.

It uses a proprietory character set:

> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

and attempts to set in a way that requires HTML 4.0, and is then only
tolerated, not encouraged (I don't want to go online to check whether
the server sent this in the proper place, or worse, sent conflicting
information).  Incidentally, this does not permit the use of &#146; as
entities are interpreted in the canonical, Unicode character coding, not
in the transfer coding.

Every paragraph seems to have a class based on the default MS-Word 
classes and there seems to have been no attempt to define classes to
represent the deep structure.  Even neutral body text has a class on
each paragraph, when the sensible thing would be to have no class, or
to use DIV to set a class for the whole run of normal paragraphs.  This
looks like something very close to a literal translation of the 
MS-Word internal coding, in the same way that MS RTF is.

There is not a single <li> element in the whole document, even though
the visual appearance of lists has been simulated by the use of runs
of non-breaking spaces.  E.g. this monstrosity (<B7> indicate the
character with that code, not an HTML tag):

>   <p class="MsoNormal"
>   style="margin-left:.25in;text-indent:-.25in;mso-list:l35
> level1 lfo34; tab-stops:list .25in"><![if !supportLists]><span
> style="font-size:16.0pt; font-family:Symbol"><B7><span style="font:7.0pt
> &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> </span></span><![endif]><span style="font-size:16.0pt">Removes age-related
> barriers to participation in society<o:p/></span></p>

Should have been:

>  <li>Removes age-related barriers to participation in society

(possibly with a style for LI in the style sheet, and just possibly with
a class on the LI).

Inicidentally, the paragraph preceding this example mis-renders on Lynx,
because it can't see the styling, but would have rendered correctly if
the shorter, semantically accurate, alternative had been used:

>   �       Removes communications and information access barriers that
>   restrict business and social interactions between people with and
>   without disabilities
 
as against:

>     * Removes communications and information access barriers that
>       restrict business and social interactions between people with and
>       without disabilities

(NB, whilst proper identification of list structure is potentially a very
strong accessibility feature, authors actual accessibility aids (assistive
technology in your jargon) may have expended most of their effort in
working round the sort of HTML in your web page and not taken advantage
of being given the structure on a plate.)

Also, in the above, it almost certainly also uses fonts to misrepresent
a character to obtain a visual effect, namely the shape of the bullet.
Any selection of the MS Symbol font in HTML is extremely suspect in
accessible HTML - I don't know of any common browser that handles it
properly, i.e. only use Symbol if the selected Unicode character exists
in symbol, and translate from the Unicode character to the Symbol code
point for the same character, rather than using the numeric value of
the character to directly index using the proprietory coding vector for
the font.  (In this case, the actual character specified, a full stop
sized centre dot, doesn't look too bad.)

Also, I note that Front Page has been used at some stage. Even Front Page
98 can easily generate illegal HTML, e.g. <b>...<p>...</b>, and I think
your Front Page 3. is worse.

You break one of the accessibility guidelines by not making your anchors
describe the link - anchors in the body are just reference numbers, and
anchors in the references are the URL.  I think the former is due to
the authoring tool not understanding the medium and simply using automatic
footnote numbering.  (Actually Lynx will do this numbering itself, so
I end up with two numbers for each reference!).

>   Hit Counter

It's no use for me to know this, and I won't have been counted anyway,
as the counting is a byproduct of fetching the image of the counter!
This should have alt="".  (Generally this sort of hit counter, and hit
counting technology in general, increase bandwidth, and are undesirable
where bandwidth costs are an issue; they force an end to end access, 
reducing the effectiveness of local caching.)

Taking content, now.

>   entity's purchasing choices create.  Whenever existing technology is
>   "upgraded" by a new technology feature, it is important to ensure that
>   the new technology either improves accessibility or is compatible with
>   existing assistive computer technology.[40][38]  For example,
>   web-authoring software programs that erect barriers in their coding of
>   web pages fall under this scrutiny.

This assumes that there is a real market, but the reality is nearly 
every business is standardising on the current (a moving target) generation
of MS Office, and the problems I point out in your web page indicate that
you have fallen into the same trap (the version used indicates a very
recent purchase).  There are two approaches to producing
"accessible" (to other businesses) documents in the business world:
lowest common denominator or following de facto standards - nearly
everyone goes for the latter.

Most businesses don't want to incur any costs in complying with legislation
that doesn't benefit their bottom line, so, if they have no choice because
there is a monopoly, they are quite happy, and the suppliers can get on
with supplying what the buyers think they really want.

> Although America Online is one of the most popular services for accessing
> the Internet, it is currently not a good choice for consumers who are
> blind.  Expert screen reader users find it extremely difficult to navigate
> because of its nonstandard controls (buttons and icons) and lack of keyboard
> commands.

What the paper fails to consider is that these features, and many other
accessibility problems, exist because of conscious decisions that they should
exist, in order to make the products attractive to the primary market, 
to establish a house style, and for portal sites, to force people to
search the page so they are more likely to read the adverts.  The primary
purpose of many web sites is not to provide information to consumers but
to provide readers to advertisers, and data to market researchers.

If you don't understand that accessiblity is often in conflict with
the perceptions of commercially good web site design, and in the case
of advertising sites may break the effect by revealling the paucity of
information actually being supplied, you can't sensibly come up with
ways that site designers will identify with.

Go to your local book shop and you will find that nearly all the books
on HTML are telling people all the tricks that result in inaccessible
pages!  You have to go to the source documents, like the HTML 4.0
specification, which most authors consider too technical, to get a
different picture.

>   part of her work in advising City Council because the documents were
>   posted in an inaccessible format, ie. portable document format (pdf).
 
You have to consider that the alternative is often a proprietory word
processor format, or a very messy attempt by that word-processor to
produce plain text, or, for newer documents, the sort of semantically
broken HTML in your web page.  Actually one of the design goals of PDF
is to allow extraction of the text, and this is quite possible if output
is designed for PDF creation, but can be rather more difficult when word
processors insist on positioning almost every character individually.
E.g. if I use PDF created from an nroff source file (which itself is
directly readable) and run the ghostscript ps2ascii utility on it,
nearly every word survives intact, although there may be a few spaces
in the middle of words.  I would expect Acrobat to do even better.
Doing the same on an MS-Word document will stress Acrobat to the limit.

>   installed base.  Microsoft has been accused of trying to extend both
>   Java and HTML in proprietary directions.[92][89]
 
Both Microsoft and Netscape have done this; in fact it could be argued
that most of your web accessibility problems stem from Netscape's 
pandering to commercial *wants*, which was probably the basis of
the business plan that turned a group of NCSA people into the Netscape
company.  It seems to me that W3C has been fighting a rearguard
action to try and regain as much as possible of the original spirit
of academic freedom of information.

>   Perhaps research is needed on how to best manage open standards where
>   civil right protections are afforded the community of people with
>   disabilities in their access to technology.  As suggested by O'Reilly,

At the core of this is the question of the role of the state in regulating
industry; something, I believe, of a hot potato in US politics.


If you are serious about writing accessible documents, start with Notepad,
not MS-Word, write everything in plain text, then add HTML structures to
reflect the structure of the document.  Only then start considering 
appearance.  That way you will think about the contents, then its structure,
which are the things that need to be preserved if you want good 
accessibility.

If you can't read the HTML as plain text, you have probably got it
wrong for an accessible document.

The two worst things for accessibility are authoring tools and cut and
pasting from sites which "look good" without understanding the structure.

If you want to make the web page accessible, you have to decide whether
you want it to be accessible in its current visual form, in which case, 
in spite of it's negative position on PDF, PDF is the appropriate choice,
or whether you want the content to be accessible, in which case you should
save it as text and edit in the HTML markup by hand.

One final point to remember, is that most web site are not about
communicating information; they are about selling, and these days the
two are almost incompatible.  (It's an interesting point that, in spite
of the accessibility lobby's objections to PDF, PDF is where you will
find the real information on most commercial sites, and the HTML on the
sites is used where PDF would have been a better vehicle, because those
parts are all form and no content!)

[ Note I am only subscribed to the lynx list, not the accessibility list,
 and, if the latter is closed, this may fail in that list. ]
Received on Sunday, 29 August 1999 18:03:15 UTC