- From: Peter Flynn <pflynn@imbolc.ucc.ie>
- Date: 18 Jun 1998 16:32:42 +0100
- To: d.cary@ieee.org
- Cc: roconnor@uwaterloo.ca, www-html@w3.org
David Cary writes: Dear "Russell Steven Shawn O'Connor" and Peter Flynn, The comment that some kinds of validation should be done *only* by the browser doesn't make sense to me. I don't think I ever said it should be done only by the browser. I hope not, anyway :-) Here are a few things which I wish my validation tools would check: Once I forgot to put the terminating quote on a URI inside a <a></a> entity. Since ">" seems to be a valid character inside a string, ... my validation tools gave me error messages, but they were misleading. It took me a while to figure out the real problem. Yes, you are expected to understand the error messages, having first read the SGML standard :-) 1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Draft//EN"> 2 <html> 3 <head> 4 <title>Test</title> 5 </head> 6 <body> |-- here's the missing quote 7 <p><img src="foo.gif" alt="A Foo>Me</p> 8 <p><img src="bar.gif" alt="A Bar">My dog</p> 9 </body> |-- this is where it gags (line 8, char 24) 10</html> nsgmls -s -c/usr/local/lib/sgml/CATALOG test.html ld.so: warning: /usr/lib/libc.so.1.8 has older revision than expected 9 nsgmls:test.html:8:24:E: an attribute value literal can occur in an attribute specification list only after a vi delimiter The parse is taking the whole of this: "A Foo>Me</p> <p><img src=" as the value of ALT (understandably, since it starts and ends with a quote). The attribute value literal is the unquoted attribute value, here bar.gif which it finds next. Then it hits a new quote, which is out of context. Simple, isn't it? :-) I once had a bunch of URIs similar to <a href="www.ti.com">TI</a>, which the DTD would accept. My link check software kept telling me that this was a bad link, but the URI seemed to work fine when I manually typed it into my web browser ... color me confused. I wish I had gotten some warning that would suggest "I think you meant to say http://www.ti.com/ ". The problem is that SGML does not (and cannot) provide any syntax- checking INSIDE an attribute apart from testing if it's a valid ID, for example, or a valid NUMBER. Once it's specified as CDATA, anything will go in there...and it's up to the application to check it. So your validator was perfectly correct in saying that www.ti.com was valid character data and your link-checker was right to throw it out as missing the scheme. I wish my validators would warn me when "You forgot to put a 'alt' attribute inside this <img> tag". (same for the height and width attributes). Easy to fix: edit your DTD and change the ATTLIST for IMG from ALT CDATA #IMPLIED to ALT CDATA #REQUIRED Oh...you're not using a DTD? Many people intend to make *every* graphic a link, so they would appreciate a program that listed which <img> tags were not wrapped in a <a></a> tag. Here's a 5-line Omnimark program to do this. Snip this into a file called soloimg.xom and run Omnimark LE over your file with a batch file or shell script or even commandline like this: omle sgmlhtml.dec %1.htm -s soloimg.xom --------------------- soloimg.xom ------------------------- down-translate element IMG when ancestor isnt A output "Image for %v(src) is not inside an <A>%n%c" element #IMPLIED put #suppress "%c" ----------------------------------------------------------- You do need to make sure your copy of the relevant HTML DTD is in the same directory (if you use a SYSTEM identifier in your DOCTYPE declaration) or referenced in a catalog if you use PUBLIC. Even though the "<" is apparently legal SGML, I intend to always use the full "<" and would like some warning when I slip up. It's not a slip, and it's not "apparent". You can use < with no semicolon any time that the < is followed by a space or other punctuation. It's only when you follow it with another letter that it's an error, eg <H2> will cause a complaint that entity "ltH2" is not defined -- reasonably enough, I think. I intend to wrap every URI in the source text with a link to that URI. I would like a validator to check that every string (outside of a tag) of the form "http:" or "ftp:" or "mailto:" (what others are there now ?) is not merely inside a <a></a> entity, but that the href attribute is actually set to the *same* location (rather than some other unrelated location). It would be nice if editors could do this (actually ADEPT and Author/Editor can if you use their scripting languages). Omnimark can do this as a standalone program like above: something like translate pcdata ( ("http:" or "ftp:") "//" [ letter or "." or "-" ]+ ( ":" digit+)? [ "/" ]? -- I won't go on, you get the idea, it's a pattern-match -- ) =url when name of element isnt A output "<a href=%"%x(url)%">%x(url)</a>" You could do another one in the same file for occasions when name of element is A, and check %v(href) is equal to %x(url). This is sounding like an advert for Omnimark [disclaimer: I have no connection except as a satisfied user] but what I'm trying to say is that all the tools to do these things already exist...but they assume you are creating valid HTML to start with: then checking this stuff becomes trivial. I don't think my tools are smart enough to check that (a) for every <a href="#misc">misc</a> there is one and only one <a name="misc">misc</a> in the document, and (b) that for each <a name="misc">misc</a> there is at least one <a href="#misc">misc</a>. Make them ID/IDREFs without the # and any SGML parser will automatically check them. Then flip 'em back to NAME and HREF. Oh...you're not using a DTD? When I add a new section to a page, something like (b) would remind me to add that section to the table of contents I keep at the top of the page. Any decent editor macro should be able to do this. But will it delete a ToC section when you remove a section? Or change its name? In my opinion, *every* web page needs to have a email address somewhere on it, so people viewing it can respond to any questions the author raises. This is _content_, SGML can't do anything about that. But you could add a compulsory <ADDRESS> to the end of the content model for <BODY> in your DTD. I'm sure there are many other little things that a machine could easily check, but that current validators do not check. Yep. ///Peter
Received on Thursday, 18 June 1998 11:31:30 UTC