Message-Id: <9211191037.AA16353@pixel.convex.com> To: www-talk@nxoc01.cern.ch Subject: HTML DTD issues Date: Thu, 19 Nov 92 04:37:23 CST From: Dan Connolly <connolly@pixel.convex.com> The thrust to register HTML with the authorities has spurred me to look over the DTD again. I've found some problems. 1. Currently the NAME attribute of an anchor is declared as CDATA, i.e. just about anything. There's an SGML thingy called an ID. SGML parsers enforce uniqueness among the IDs of a document. Seems like that's what we want for ID names. But an SGML ID has to start with a letter. So all the HTML files that use numbers as anchor names will break. 2. I introduced two tag names when I drafted the DTD: HTML contains the whole document. I defined it so you can omit both the start and the end tags, so it's inferred by SGML parsers. I don't think I can avoid some top-level tag. DOCUMENT contains most of the "body" -- all the headings and paragraphs. I did this to avoid something called mixed content, which causes complications. I could rename this element as BODY, and introduce a omitable HEADING tag to surround the TITLE, NEXTID, and ISINDEX tags. 3. I stuck anchors in as an inclusion, meaning they could be used just about anywhere. I thought stuff like <a name=foo><h1>Foo</h1></a> was legal, but neither linemode nor the midas browser groks. I'm editing the DTD to restrict the usage of anchors to only contain text strings. 4. The OL tag is disappearing. It's no longer documented in the web, and it's not supported by MidasWWW. Should I delete it from the DTD? 5. What about <HP1> thru <HP5>... should we include them? I'd prefer <em>, <tt>, <cite>, ala TeX. Or we could go with the O'Reilly/Hal DocBook tags: <Emphasis>, <OopsChar>, <wordasword>,<CiteBook>,<Subscript>, <Superscript>. 6. Any more thoughts on the BaseAddress tag? 7. The HTML tags documentation says Listing sections can contain any ISO Latin 1 characters. The SGML standard mentions ISO 646, i.e. ascii, as the default, but the sgmls parser, the linemode browser, and MidasWWW all seem to grok Latin1 just fine. Dan