- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Sun, 22 Oct 2000 10:49:17 +0100 (BST)
- To: w3c-wai-ig@w3.org
> > But that's not the Web, even originally. That is an SGML idea, which is = > only But the web isn't HTML and pre-dates it. There are many concepts that get confused in popular terminology (including that used by most content authors). Three such terms are: - the web - this originates from the mesh of links (most commercial sites, are barely part of *the* web, although they may form isolated micro webs - I would argue that gopher created the web; HTML was a successor to gopher; - internet sites - these are even older than the web, although were primarily FTP before the development of HTML; - HTML - this is a structural markup language that can be used as one way of constructing nodes in the web, rather than the mix of almost any technology except PDF and plain text that it refers to in popular terms. > accidentally involved in the origins of the Web. The Web idea was not of > "documents independent of the kit used to view them." The Web idea requi= > red a However, HTML was. Presentational hypertext was well established before HTML. The web already existed. Internet sites had existed for a long time. If the aim had been to produce the current commercial web, T B-L would have gone into partnership with Adobe, whose PDF was well beyond HTML version 3.2 in terms of the features desired by commercial sites with the one exception of network integration, even at that time. Instead, the first HTML specification says this (T B-L's own words): It is required that HTML be a common language between all platforms. This implies no device-specific markup, or anything which requires control over fonts or colors, for example. This is in keeping with the SGML ideal. Source: HTML Design Constraints <http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/HTMLConstraints.html> > user agent that implemented hyperlinks encoded as URI references. Tim us= > ed I think a more correct description of the design essence was simplicity, as quoted in the following: for their documents. A few were using SGML. Tim realized that something simpler was needed that would cope with dumb terminals through high end graphical X Windows workstations. HTML was conceived as a very simple solution, and matched with a very simple network protocol HTTP. Source: HTML Home Page <http://www.w3.org/MarkUp/#historical> "HTML" as we know it is really an invention of Netscape, and only really happened because Adobe failed to exploit the emerging market. PDF was designed with commercial wants in mind, whereas "HTML" represents the aftermath of a war between its original design concepts and what commerce really wanted, with the result that most commercial internet site HTML files are a mess to look at in source form. My perspective on the history is that Adobe PDF failed because: 1) It had already been around for several years and was therefore "disqualified" from becoming a new fashion; 2) Because of efficient compression, most users failed to realise that it is actually a textual format++, and therefore considered PDF files as black boxes; 3) Adobe's business model was based on selling authoring tools, which excluded students and organisations on a low budget (this latter point shouldn't worry Anne); 4) The simplest approach to producing PDF is to create PostScript first, but most PostScript documents are effectively created by a rough equivalent of document.write in the printer, which makes them look very complex== ("HTML" is going this way on many commercial sites) - this again makes people think it is difficult to create by hand##; 5) It had a reputation for bloated files, but this was really because many PDF files where logically GIFs of non-electronic documents (hopefully actually fax encoded) and word processors generate very bloated PostScript (recent versions of MS Word attempt to place almost every character individually, rather than using the rules built into the font and applying a percentage adjustment to word and character spacings; the example below was created with Gnu troff, which generates fairly clean PostScript). Bloat is becoming normal in "HTML"; 6) Adobe failed to add network features (and possibly forms) until HTML had already captured the market; 7) They held out against Microsoft, meaning that Windows only provides reluctant support for PostScript post processing - most Windows users don't know how to create a PostScript file or how to print one on a local printer, both of which can be done without additional software. PDF had minimum accessibility goals, but there were some. It's an explicit aim that the plain text should be extractable from the documents. This depends on people not doing text as graphics, but its support for embedded fonts means that is much less necessary, and it can even do quite a few word art effects in the browser (colour gradients for example). The latter, though, are likely to done from the menu in MS Word, which constructs a graphic, rather than outputting the PostScript to create the graphic from the text. Even for scanned documents, it allows an underlay of the OCRed equivalent. I think it does this in a way that allows cutting and pasting as though the document were just text, which you can't do with alt text in current HTML browsers. The latest specification even allows a semantic overlay to be added. (SVG, in my view, is a descendant of PDF, rather than of HTML, and could change the world of commercial internet sites.) ++ These are a few lines from a PDF document with compression turned off (a minimal PDF file doesn't even need all the numbers used here for positioning): 100 0 Td(The premises referred to in paragraph \(A\) of Clause 3 of the Memorandum)Tj 0 Tw 0 -12 Td(of Association.)Tj -100 -15.6 Td(Flat Owner)Tj ## PDF does contain pointers to byte offsets which are difficult to maintain by hand, but there are third party tools that will convert Postscript and it would be fairly easy to create authoring tools that took a sort of proto-PDF and created real PDF. == The procedure definitions for a simple graphic EPS file can run to over 100K, and nearly all of them will not be used by simple graphics.
Received on Sunday, 22 October 2000 06:13:24 UTC