Re: Spyglass HTML Validator

Mon, 28 Oct 1996 14:10:17 -0500 (EST)

Date: Mon, 28 Oct 1996 14:10:17 -0500 (EST)
From: Foteos Macrides <MACRIDES@SCI.WFBR.EDU>
Subject: Re: Spyglass HTML Validator
Message-id: <01IB6G71LGXU00EWJV@SCI.WFBR.EDU> (Stewart Brodie),
  developer of the ArcWeb browser for Acorn RISC OS computers, wrote:
>> > <a href="current&amperes.html">Electric!</a> --->  voltage&eres.html 
>>                    ^^^^^                                     ^
>Got it in one!  As a browser writer, it's pointless reading the specs
>without doing a direct comparison between how Netscape parses it, and
>how your own browser handles it, otherwise people complain to me that
>my browser can't handle the links but Netscape can.
>The majority of Netscape users don't complain to Netscape[1] that their
>code is broken, so it always left to the minority implementations (or
>platforms like mine which Netscape/MS don't support at all) to break
 ^^^^^^^^^                 ^^^^^^^^^^^ ^^^^^^^^^^^^^
>their implementations in order to mimic the behaviour of Netscape.
>[1] just the opposite - they tell people like me to stop moaning and
>implement the "de facto standard"                   ^^^^^^^^^^^^
 ^^^^^^^^^      ^^^^^^^^^^^^^^^^^

and Peter Flynn <>,
    developer of the HTML Pro DTD (about to be updated?), wrote:
>   Hm, I thought entities were terminated with ";". If my HREF wasn't
>   legal, how could your example be?
>Not if the entity name is followed by another ERO (ampersand) or a
>white-space delimiter such as SPACE or RE. But some sloppier parsers
>have taken this to mean "followed by any non-letter", and of course
>some browsers' notorious rubbish, which accepts &amplitude and
>produces "&litude"...

	The "notorious rubbish" is just a "bug", which should not
be emulated by other browsers (IMHO), because the excessively unsound
Netscape entity translation procedure will likely be (has been?)
changed now that it is dereferencing attribute values, including
those for URLs, and it's causing serious failures for many deployed
CGI scripts.

	The "sloppier" parsing, is a different matter, that requires
more thought (IMHO).  The terminator is not any "non-letter", but
any non-alphanumeric character (with variable requirements across
deployed clients for the accepted characters to be in the ASCII
range).  It stems from the libwww using isalnum(c) checks for
accumulating the entity name, and reflects how virutally *all* WWW
browsers handled character references through 1994, i.e., many
people consider it an HTML application convention, like the
minimization of HTML, HEAD and BODY, and the implied P for any
text at the start of a block's content, and which some are now
seeking to change in the interest of full conformity to SGML
principles.  And that's a GoodThing to do, but don't knock
people who were following what appeared to be expected of HTML
browsers, based on code from TheCreator, by calling them "sloppy".
:) :) :)

	Note that RFC 1866, and W3C drafts, say that the ';'
can be omitted if the terminator "can be inferred", but *none*
state a basis for that inference (you have to "know" it already
to "learn" what it is by reading them 8-).

	Note also that any browser which uses the "SGML parser"
in the most current version of the W3C Reference Library (v5a)
will handle the translations exactly as do the pre-1994 browsers,
and *not* as do browsers with truly SGML conformant parsers.

	The "de facto stardard" issue also merits more thinking
and public discussion.  Whatever spin might be placed on it,
the so-called HTML 3.2 draft, and the "standardization" of HTML
by the W3C and its ERB as they are presently structured, financed,
and functioning, amounts to a capitulation to "stop moaning and
implement the de facto standard", and the Web as a whole is far
the worse off for it, IMHO.  The DesciplesOfTheCreator would do
well to think harder about this situation.


 Foteos Macrides            Worcester Foundation for Biomedical Research
 MACRIDES@SCI.WFBR.EDU         222 Maple Avenue, Shrewsbury, MA 01545