Re: CheckHtmlEsis from Peter Flynn on 1998-04-24 (www-html@w3.org from April 1998)

From: Peter Flynn <pflynn@imbolc.ucc.ie>
Date: 24 Apr 1998 13:49:30 +0100
To: roconnor@uwaterloo.ca, www-html@w3.org, Uniform@imbolc.ucc.ie, Resource@imbolc.ucc.ie, Locators@imbolc.ucc.ie, d.cary@ieee.org
Message-id: <199804241249.NAA22805@imbolc.ucc.ie>
> The comment that some kinds of validation 

I'm sorry, I should have been clearer. "Validation" has a very specific
and precise meaning in SGML systems, as defined by ISO 8859. If you want
to perform some other kind of consistency/meaningfulness/usability check
on the _content_ on CDATA attributes, then you really need to call it
by some other word than "validation", or qualify the term, such as
"semantic validation". Otherwise you risk confusing people.

> should be done *only* by the
> browser doesn't make sense to me. It seems to me that a web author would
> like to know if his document is invalid in this or any other way, so he can
> fix it.

Same goes for arbitrary use of the term "invalid". "Invalid" means a file
does not pas through a validating SGML parser without error. If you have 
put <IMG SRC="foo.jpg" WIDTH="wombat"> by mistake, this is valid but not
meaningful, for the reasons I previously gave. 

> Here are a few things which I wish my validation tools would check:
> 
> Once I forgot to put the terminating quote on a URI inside a <a></a>
> entity. Since ">" seems to be a valid character inside a string, ... my
> validation tools gave me error messages, but they were misleading. It took
> me a while to figure out the real problem.

If you use a real validating parser it will detect this and tell you.
But you need to know what you're doing:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Draft//EN">
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <p><img src="foo.gif" alt="A Foo>Me</p>
    <p><img src="bar.gif" alt="A Bar">My dog</p>
  </body>
</html>

gives the message:

nsgmls:test.html:8:24:E: an attribute value literal can occur in an attribute specification list only after a vi delimiter

It's taking the value of the first alt to be "A Foo>Me</p> <p><img src="
and then complaining that there is no space before the next attribute
which it sees as starting with a "b" (of "bar").

> I once had a bunch of URIs similar to <a href="www.ti.com">TI</a>, which
> the DTD would accept. 

Of course: because they are valid SGML. This is like having a program
which says to print "sod off". It's perfectly valid, just not "right".
But no programming language can be expected to have a concept of this
kind of "rightness" -- and SGML cannot be expected to have a concept
of what makes a URL work, because URLs were invented afterwards.

> My link check software kept telling me that this was
> a bad link, but the URI seemed to work fine when I manually typed it into
> my web browser ... color me confused. 

Nothing to do with color. www.ti.com is a perfectly usable URL if you take
the browser assumtion that the default method is http and the default action is
to request the default file from the server (ie add the missing trailing 
slash).

> I wish I had gotten some warning that
> would suggest "I think you meant to say http://www.ti.com/ ".

This would require the software you used to understand RFC1738 (URL spec).

> I wish my validators would warn me when "You forgot to put a 'alt'
> attribute inside this <img> tag". (same for the height and width
> attributes).

If you use a decent DTD and an editor which uses it then you get this
kind of thing for free.

> Many people intend to make *every* graphic a link, so they would appreciate
> a program that listed which <img> tags were not wrapped in a <a></a> tag.

sgrep or one of the LT NSL programs (and probably a dozen other systems)
let you do this kind of thing. But it needs to know about SGML first.

> Even though the "&lt" is apparently legal SGML, I intend to always use the
> full "&lt;" and would like some warning when I slip up.

The &lt form without the semicolon is valid only when the next 
character is white-space or another &. So &eacute&agrave; is fine
but &eacutefoobar is silly.

> I intend to wrap every URI in the source text with a link to that URI. I
> would like a validator to check that every string (outside of a tag) of the
> form "http:" or "ftp:" or "mailto:" (what others are there now ?) is not
> merely inside a <a></a> entity, but that the href attribute is actually set
> to the *same* location (rather than some other unrelated location).

Word does this and it's horrible. What if I want to talk about the prefix
"http:"...do you want that string make into a [non-existent] link?

Emacs can do this kind of thing in macros: so should any self-respecting 
editor. Perl hackers can probably do it with five lines of modem noise :-)

> I don't think my tools are smart enough to check that (a) for every <a
> href="#misc">misc</a> there is one and only one <a name="misc">misc</a> in
> the document, and (b) that for each <a name="misc">misc</a> there is at
> least one <a href="#misc">misc</a>. When I add a new section to a page,
> something like (b) would remind me to add that section to the table of
> contents I keep at the top of the page.

Buy a big SGML editor and you get this kind of thing.

> In my opinion, *every* web page needs to have a email address somewhere on
> it, so people viewing it can respond to any questions the author raises.

Good idea. 

> I'm sure there are many other little things that a machine could easily
> check, but that current validators do not check.

Lots, but don't call them validators :-)

///Peter
Received on Friday, 24 April 1998 08:49:22 UTC