RE: XHTML/XML comment from rev-bob@gotc.com on 2000-01-31 (www-html@w3.org from January 2000)

From: <rev-bob@gotc.com>
Date: 31 Jan 2000 10:03:00 -0500
To: www-html@w3.org
Message-Id: <200001311003770.SM01100@Unknown.>
> ** Original Sender: Vidiot <vidiot@vidiot.com>
>
> I just printed the XHTML 1.0 document and an floored by the
> following:
> 
>    4.2 Element and attribute names must be in lower case.

XML is case-sensitive; as a reformulation of HTML in XML syntax, XHTML must be case-
sensitive as well.

>    4.3 For non-empty elements, end tags are required.

XML has no mechanism for specifying "optional" tags - either it's empty or it's a container, 
and if it's a container, you have to have a beginning AND an end tag.  Of course, I consider 
that just plain common sense, but I realize I'm in the minority.  (Imagine, having the nerve to 
require that people straighten out their code when using new specs.  Whatever could W3C 
have been thinking?)

>    4.4 Attribute values must always be quoted.

Again, common sense.  Strict SGML rules allow some attribute values to remain unquoted, 
but it's generally much easier to say "quote 'em all" than to memorize the quote-optional rules.  
The biggest place I was affected by this was in my IMG tags, where I left the numeric values 
unquoted because SGML allowed it.  Well, times change.  Quoting 'em doesn't break any 
UAs, not quoting 'em breaks the XHTML DTD, so I quoted 'em.  Whoopee.

>    4.6 Empty Elements.

What about 'em?  Does the little slash annoy you, or something?  We've had empty elements 
in HTML for a long time; we just didn't necessarily call 'em such.  (Or have you never heard 
of HR, BR, IMG, META, INPUT, et al.?)

>    C.12 Using Ampersands in Attribute Values

Technically, I don't think this is a change.  SGML rules *always* require an ampersand to be 
escaped.  Don't tell me you're upset that W3C actually expects people to follow the rules....

> You've got to be kidding.  There are millions of HTML pages that
> exist that use case insensitive elements and attribute.  Same
> thing applies to elements like BASE and BR.  How many CGI
> scripts exist that will not parse &amp;.

Look, nobody says you have to change your pages to XHTML.  If you want to, go right on 
using broken code that "most browsers" don't have "much" trouble with - just don't label it as 
XHTML.  The HTML DTDs aren't going away; if you want to, you can make all your 
documents HTML 2.0-compliant and nobody will complain.  Just because XHTML is 
available now does *not* mean you have to use it - and if you do use it, nobody says you 
have to use it exclusively.  Speaking for myself, I whipped up a little dual-compilation scheme 
that generates HTML 4.01 pages on one side and XHTML 1.0 pages on the other side - 
with a tiny little server-side routine to select between 'em based on the user's browser.  This 
way, I can clean some legacy crap out of my XHTML code, because I know that I can 
always reroute older browsers to the HTML pages with no ill effects.

Incidentally, CGI programs should never be sent an "&amp;" string unless that ampersand is 
escaped.  According to the specs, given the following URL in a page:

http://www.yoursite.com/cgi-bin/program.cgi?x=1&amp;y=2&amp;z=3

the UA should automatically translate that to read:

http://www.yoursite.com/cgi-bin/program.cgi?x=1&y=2&z=3

Entities are there to ensure that the UA gets the data properly.  Once the UA is finished with 
a page, all the entities in its copy of that page should be replaced by the actual values...so if 
you click on a link, the UA is supposed to send the actual characters instead of the entity 
references.

> Guess I won't be using XHTML/XML any time in the near future.
> I'll be damned if I am going to go through the 59,524 HTML
> pages the I currently have and make them XHTML compliant.

Goody goody gumdrops.  If that's an actual count instead of a random large number, you'd 
probably have a hard time merely changing all your !DOCTYPE statements anyway - so just 
leave everything as it is.  Like I said, nobody's got a gun to your head saying "UPGRADE 
YOUR CODE!"

> I do try and do 4.4, but all of my documents are not perfect
> in that regard.  I prefer CAPS for elements for readability,
> as I HAND EDIT all of my HTML documents using asWedit.

I handcode all my pages, too - but I do use htp (http://www.crl.com/~jnelson/htp) as a handy 
little tool for maintaining templates and reusing code fragments.  (It's a much better option 
than having to change about 300 documents every time I make a change to the basic 
template!)  Yeah, I had to go in and patch a couple of tags here and there - but then, I was 
using lowercase tags anyhow.  (Of course, you could always copy the XHTML DTD and 
make all the tags uppercase, then use that instead of the W3C version - but if you've mixed 
your cases anywhere, you'll still have trouble.)  Then again, there's always HTML Tidy, and if 
it bothers you that much to use lowercase tags, you could always keep your original files, run 
HTML Tidy to generate a "clean" XHTML set for upload, and just keep that extra step in 
your process from here on out.  Save you a bit of trouble, that could....

> I am totally confused as to why the W3C would go the over-the-top
> restrictive route.

Simple - because the major players want it.  Think about how many times you've heard 
people gripe about how huge Browser X has gotten - well, part of that bloat is required by 
the mushy structure of HTML.  "Well, this tag is a container, but we're going to make the 
closing tag optional, so people can leave it out and the browser can infer where it's supposed 
to go."  That absurdly-high level of fault-tolerance in the UA encourages exactly two things - 
sloppy code and bloated browsers - and that's why the Web is in the deplorably sloppy state 
it's in.  People don't have to write well-formed code, so they don't - and the browsers are 
expected to pick up the slack.  Nobody writes C compilers that try to figure out what you 
meant to say - if you leave a semicolon out, the compiler kicks you in the pants and makes 
you fix your syntax.  THAT'S A GOOD THING.  Why should writing any other computer 
language be any different?

Users want smaller browsers - strict parsing requirements make that possible.  Browser 
authors don't want to have to code hugely complex heuristics to try to figure out what code 
means - there again, strict parsing means smaller programs and faster execution.  Web page 
authors want to know that their pages will look right in different browsers - and writing clean 
code gives you a better shot at that.  The bottom line is that the only people who should have 
a beef with XHTML are the people who are writing sloppy code - and they're the problem 
XHTML is out to fix.  Either clean up your code or stick with the old HTML doctypes - it's 
your choice, but the latter choice will probably wind up meaning that after a couple of 
generations, people will have a hard time reading your pages...because the browsers will be 
phasing out their support for fuzzy syntax.

> How are you going to convince people to use XHMTL with so many restrictions?

"How are you going to convince people to use C if the compiler chokes on a missing 
semicolon?"

As far as I'm concerned, it is the coder's job - yes, this means you and me - to write clean 
code.  It is the interpreter's job to parse that code CORRECTLY, not FORGIVINGLY - 
because every error it forgives is one that we don't catch.  If you can't be bothered to do 
your job well, why are you doing it at all?

> You certainly haven't convinced me that I should change.

That much is obvious.  It's also obvious - to me, anyhow - that the only changes you care 
about are grabbing new features to shut people out, instead of making substantial changes to 
clean up your code.  Yeah, XHTML is a big step - who claimed otherwise?  If you don't 
want to take it, go right ahead.

> To tell millions of people who create web pages that "by the way, forget everything you
> ever knew about making web pages" is not going to go over very well at all.

In case you haven't noticed, HTML 4.01 *and* XHTML 1.0 are *both* current 
Recommendations.  That means you have a choice as far as "latest and greatest" goes - and 
you can still mark your pages as HTML 4.0, 3.2, 2.0, or plain 1.0 if you want.  (I note that 
you don't mark your code as *any* version of HTML...so exactly what are you complaining 
about?)  The message is hardly "forget everything you ever knew" - in fact, I found the 
changes to be very logical and rather simple to implement.  It all boils down to removing the 
concept of "optional" elements - either something is required, or it is disallowed.  IMG is an 
empty tag, thus it must be specified as one and cannot take a closing tag.  P is a container 
tag, so it must be closed.  Case matters, so use lowercase - because that's what they decided 
on.  If you take a minute to think about it, XHTML should be even easier to learn than 
HTML was - because now you have a concrete syntax set as opposed to a mushy set of 
optional and inferred tags.  This means more reliability all the way around - but then, I guess 
you're too self-righteously angry to even consider that.

> Of course, what I wrote above is personal opinion.  But, I am one
> of millions of web page creators and I am not impressed with
> XHTML/XML at all.

And I'm another of those millions, and I happen to *like* XHTML.  But then, I was raised to 
believe that you should fix errors instead of leaving them in for other folks to stumble over....

> I'll be dead before I go the XHTML route.

At least then you won't be complaining about the world moving on without you.  Heck, go 
straight ASCII for all I care.  I prefer to have XHTML in my arsenal, in case a client wants 
the flexibility it can provide.



 Rev. Robert L. Hood  | http://rev-bob.gotc.com/
  Get Off The Cross!  | http://www.gotc.com/
Received on Monday, 31 January 2000 10:02:42 UTC