W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > October to December 2000

Re: belittling designers, two kinds of accessibility

From: David Woolley <david@djwhome.demon.co.uk>
Date: Sun, 22 Oct 2000 10:49:17 +0100 (BST)
Message-Id: <200010220949.KAA06921@djwhome.demon.co.uk>
To: w3c-wai-ig@w3.org
> 
> But that's not the Web, even originally.  That is an SGML idea, which is =
> only

But the web isn't HTML and pre-dates it.  There are many concepts that
get confused in popular terminology (including that used by most content
authors).  Three such terms are:

- the web - this originates from the mesh of links (most
  commercial sites, are barely part of *the* web, although they may
  form isolated micro webs - I would argue that gopher created the 
  web; HTML was a successor to gopher;

- internet sites - these are even older than the web, although were 
  primarily FTP before the development of HTML;

- HTML - this is a structural markup language that can be used as one way
  of constructing nodes in the web, rather than the mix of almost any
  technology except PDF and plain text that it refers to in popular terms.

> accidentally involved in the origins of the Web.  The Web idea was not of
> "documents independent of the kit used to view them."  The Web idea requi=
> red a

However, HTML was.  Presentational hypertext was well established before
HTML.  The web already existed.  Internet sites had existed for a long time.
If the aim had been to produce the current commercial web, T B-L would have
gone into partnership with Adobe, whose PDF was well beyond HTML version 
3.2 in terms of the features desired by commercial sites with the one 
exception of network integration, even at that time.

Instead, the first HTML specification says this (T B-L's own words):

   It is required that HTML be a common language between all platforms.
   This implies no device-specific markup, or anything which requires
   control over fonts or colors, for example. This is in keeping with the
   SGML ideal.

Source:
HTML Design Constraints
<http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/HTMLConstraints.html>
 
> user agent that implemented hyperlinks encoded as URI references.  Tim us=
> ed

I think a more correct description of the design essence was simplicity,
as quoted in the following:

   for their documents. A few were using SGML. Tim realized that
   something simpler was needed that would cope with dumb terminals
   through high end graphical X Windows workstations. HTML was conceived
   as a very simple solution, and matched with a very simple network
   protocol HTTP.

Source:
HTML Home Page
<http://www.w3.org/MarkUp/#historical>

"HTML" as we know it is really an invention of Netscape, and only really
happened because Adobe failed to exploit the emerging market.  PDF was
designed with commercial wants in mind, whereas "HTML" represents the
aftermath of a war between its original design concepts and what commerce
really wanted, with the result that most commercial internet site HTML
files are a mess to look at in source form.

My perspective on the history is that Adobe PDF failed because:

1) It had already been around for several years and was therefore 
   "disqualified" from becoming a new fashion;

2) Because of efficient compression, most users failed to realise that
   it is actually a textual format++, and therefore considered PDF
   files as black boxes;

3) Adobe's business model was based on selling authoring tools, which
   excluded students and organisations on a low budget (this latter 
   point shouldn't worry Anne);

4) The simplest approach to producing PDF is to create PostScript first,
   but most PostScript documents are effectively created by a rough
   equivalent of document.write in the printer, which makes them look
   very complex== ("HTML" is going this way on many commercial sites) -
   this again makes people think it is difficult to create by hand##;

5) It had a reputation for bloated files, but this was really because
   many PDF files where logically GIFs of non-electronic documents
   (hopefully actually fax encoded) and word processors generate very
   bloated PostScript (recent versions of MS Word attempt to place almost
   every character individually, rather than using the rules built into
   the font and applying a percentage adjustment to word and character
   spacings; the example below was created with Gnu troff, which generates
   fairly clean PostScript).  Bloat is becoming normal in "HTML";

6) Adobe failed to add network features (and possibly forms) until HTML
   had already captured the market;

7) They held out against Microsoft, meaning that Windows only provides
   reluctant support for PostScript post processing - most Windows users
   don't know how to create a PostScript file or how to print one on a local
   printer, both of which can be done without additional software.

PDF had minimum accessibility goals, but there were some.  It's an
explicit aim that the plain text should be extractable from the documents.
This depends on people not doing text as graphics, but its support for
embedded fonts means that is much less necessary, and it can even do quite
a few word art effects in the browser (colour gradients for example).
The latter, though, are likely to done from the menu in MS Word, which
constructs a graphic, rather than outputting the PostScript to create
the graphic from the text.

Even for scanned documents, it allows an underlay of the OCRed equivalent.
I think it does this in a way that allows cutting and pasting as though
the document were just text, which you can't do with alt text in current
HTML browsers.

The latest specification even allows a semantic overlay to be added.

(SVG, in my view, is a descendant of PDF, rather than of HTML, and
could change the world of commercial internet sites.)

++ These are a few lines from a PDF document with compression turned off
(a minimal PDF file doesn't even need all the numbers used here for
positioning):

    100 0 Td(The premises referred to in paragraph \(A\) of Clause 3 of the Memorandum)Tj
    0 Tw
    0 -12 Td(of Association.)Tj
    -100 -15.6 Td(Flat Owner)Tj

## PDF does contain pointers to byte offsets which are difficult to
maintain by hand, but there are third party tools that will convert
Postscript and it would be fairly easy to create authoring tools that
took a sort of proto-PDF and created real PDF.

== The procedure definitions for a simple graphic EPS file can run to
over 100K, and nearly all of them will not be used by simple graphics.
Received on Sunday, 22 October 2000 06:13:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:13:50 GMT