Re: Flagging & in URL in HTML 4.01 transitional type. from Mike Heins on 2001-06-08 (www-validator@w3.org from June 2001)

From: Mike Heins <mheins@redhat.com>
Date: Fri, 8 Jun 2001 15:26:07 -0400 (EDT)
To: www-validator@w3.org
Cc: "Peter Foti (PeterF)" <PeterF@SystolicNetworks.com>
Message-ID: <20010608152448.A11227@bill.heins.net>
Quoting Peter Foti (PeterF) (PeterF@SystolicNetworks.com):
> > Since every browser in the world must tolerate &, 
> 
> 
> No, actually most do tolerate the & without encoding, however, it is
> foolish to say that every browser MUST tolerate it.  There is a reason
> why the & must be escaped, you just aren't bothering to ask why.

>From a pedantic technical perspective, I *understand* why.

Every browser *must* tolerate it if it is to have a chance of being
usable for the next 10 years, OK. If the browser has no intention
of being usable for anything but HTML4-compliant sites, a percentage
that will struggle to achieve double digits in the next five years,
OK.

> 
> Consider this:  certain characters (like < or > ) must be escaped so
> that the browser knows that it is not part of the HTML code.  For
> example, if I want my page to display:
> 
> 0 < 1 & 2 > 1

I know all of the technical reasons.

> 
> then the browser needs to know that this is not part of an HTML tag.  So
> the special characters need to be escaped.  So < becomes &lt; and >
> becomes &gt;
> But now we have created a new special character that the browser has to
> look for... the & signifies the beginning of an escaped sequence now.
> So therefore, whenever a browser sees an & it needs to see if there's an
> entity that represents a special character.  Therefore, to display an &,
> we escape it with &amp;
> 
> 
> > my opinion is that
> > this is an artificially created tempest in a teapot, created by the
> > failure of the validation suite writer to provide a 
> > "pedantic" mode. Or
> > the failure of the specification writers to create an 
> > exception for this
> > in the transitional type.
> 
> Your opinion is flawed.

You make that statement without supporting it.

> 
> > 
> > If browsers didn't accept this construct, 98% of the web 
> > would break. A
> > significant portion of the web would break for the forseeable future
> > as well, so it is not a simple question of coalescing support to move
> > in the direction of compliance.
> 
> 98% eh?  That's quite a bit of invalid code floating around then, isn't
> it?

Yes, which is why I would say that it is not really invalid, just
not compliant with the strict HTML 4 specification.

> I think your guess is extremely high (and wrong) and that you have
> no data to support your theory.

A recent study cited by CNET.com stated that over 50% of web traffic
was concentrated on 4 sites (Yahoo, AOL, Microsoft, and CNET).  All but
Microsoft has unescaped & characters in their HTML parameters directly
on their home page; all have links on the home page that lead to pages
that have the unescaped character. My extensive experience with thousands
of other web sites on a programming level suggests that the vast, vast,
majority is no different.

> However, the fix for this would of
> course be for lazy web page designers to do it right the first time and
> use &amp; instead of &.  Fortunately (or unfortunately, if you want more
> standards based, clean code to be developed) most browsers will simply
> understand that a standalone & without any known escaped character
> sequence following it, is just an ampersand, and they will display it as
> such.
> 
> > 
> > In my opintion that validation is pedantic, and should certainly not
> > be flagged in the HTML 4.01 transitional type.
> 
> In my opinion, you should maybe do some more homework on the topic.
> 

I have done plenty of homework. You, on the other hand, have not shown
me much more than a "Mary, Mary quite contrary" imitation.

-- 
Red Hat, Inc., 3005 Nichols Rd., Hamilton, OH  45013
phone +1.513.523.7621      <mheins@redhat.com>

Research is what I'm doing when I don't know what I'm doing.
-- Wernher Von Braun
Received on Monday, 11 June 2001 03:29:52 UTC