Re: HTML Pro questions

Peter Flynn (pflynn@curia.ucc.ie)
05 Nov 1996 16:02:21 +0000 (GMT)


Date: 05 Nov 1996 16:02:21 +0000 (GMT)
From: Peter Flynn <pflynn@curia.ucc.ie>
Subject: Re: HTML Pro questions
In-reply-to: <199611051457.PAA23882@jagor.srce.hr> (Drazen.Kacar@public.srce.hr)
To: Drazen.Kacar@public.srce.hr
Cc: www-html@w3.org
Message-id: <199611051602.QAA03921@curia.ucc.ie>

   It seems I'll be in charge of a search service and I thought I just might
   run each page through SGML validator and display number of errors to the
   innocent user of service. HTML Pro is just what I need, but I'll have to
   make it HTML 2.0 compliant. I suppose I can do it myself. This is a
   specific project, there's no need for crippling the DTD in general. And I
   must say I like name Silmaril very much, I can see validator saying "Tears
   unnumbered ye shall shed..." :)

Elen sila lumenn' omentielvo. Go right ahead and make the changes: I'm
happy to do the same if people feel it is important to make it parse
HTML 2.0 in this way.

   I don't know exactly, I was just checking which tag has the most attributes.
   Since I'll have to parse pages before of validator, I wanted to see
   if I can store information about presence of attributes in 32 bits. In
   Lynx INPUT has 30 or 31, HTML Pro has much less.

I think I'll need to cook up a little tool for this...

   Parsing before validator is needed because I've seen a lot of pages with
   --!> thing intended for comment termination, and SGML validators don't
   generate much errors for them. Most of the document appears as a comment
   and you'll get just one error about unterminated comment. Besides, it
   would be nice to count BLINKs, IMGs without ALT and some other things.

http://www.cast.org/bobby/ is not a parser but it picks up a LOT of 
these errors.

   Back to HTML Pro DTD. I think that DTD allows multiple TITLE elements
   and, if memory serves me well, I think some time ago I've seen a hack
   posted that would enable only one TITLE in HEAD. I call it a hack
   because my understanding of SGML was not enough to see what was going
   on there. :) But then, my SGML knowledge is very close to zero. The
   author was, I believe, Joe English. Perhaps you could incorporate it
   into HTML Pro DTD.

<!ELEMENT HEAD - O (TITLE & ISINDEX? & BASE? & META* & LINK* & NEXTID? & 
 BGSOUND? & SCRIPT? & NOSCRIPT? & STYLE? & RANGE*) 
 --<Title>Documentation header-->

This defines exactly one TITLE plus optional everything else: ? means
zero or one of them; * means zero or more of them. I think that's
right, shouldn't be any need for a hack.

///Peter