- From: Peter Flynn <pflynn@curia.ucc.ie>
- Date: 05 Nov 1996 16:02:21 +0000 (GMT)
- To: Drazen.Kacar@public.srce.hr
- Cc: www-html@w3.org
It seems I'll be in charge of a search service and I thought I just might run each page through SGML validator and display number of errors to the innocent user of service. HTML Pro is just what I need, but I'll have to make it HTML 2.0 compliant. I suppose I can do it myself. This is a specific project, there's no need for crippling the DTD in general. And I must say I like name Silmaril very much, I can see validator saying "Tears unnumbered ye shall shed..." :) Elen sila lumenn' omentielvo. Go right ahead and make the changes: I'm happy to do the same if people feel it is important to make it parse HTML 2.0 in this way. I don't know exactly, I was just checking which tag has the most attributes. Since I'll have to parse pages before of validator, I wanted to see if I can store information about presence of attributes in 32 bits. In Lynx INPUT has 30 or 31, HTML Pro has much less. I think I'll need to cook up a little tool for this... Parsing before validator is needed because I've seen a lot of pages with --!> thing intended for comment termination, and SGML validators don't generate much errors for them. Most of the document appears as a comment and you'll get just one error about unterminated comment. Besides, it would be nice to count BLINKs, IMGs without ALT and some other things. http://www.cast.org/bobby/ is not a parser but it picks up a LOT of these errors. Back to HTML Pro DTD. I think that DTD allows multiple TITLE elements and, if memory serves me well, I think some time ago I've seen a hack posted that would enable only one TITLE in HEAD. I call it a hack because my understanding of SGML was not enough to see what was going on there. :) But then, my SGML knowledge is very close to zero. The author was, I believe, Joe English. Perhaps you could incorporate it into HTML Pro DTD. <!ELEMENT HEAD - O (TITLE & ISINDEX? & BASE? & META* & LINK* & NEXTID? & BGSOUND? & SCRIPT? & NOSCRIPT? & STYLE? & RANGE*) --<Title>Documentation header--> This defines exactly one TITLE plus optional everything else: ? means zero or one of them; * means zero or more of them. I think that's right, shouldn't be any need for a hack. ///Peter
Received on Tuesday, 5 November 1996 11:01:12 UTC