Message-Id: <9211301327.AA05269@pixel.convex.com> To: www-talk@nxoc01.cern.ch Subject: An HTML specification and Implementors' Guide Date: Mon, 30 Nov 92 07:27:01 CST From: Dan Connolly <connolly@pixel.convex.com> I just uploaded html_spec-0.3.tar.Z to info.cern.ch in pub/incoming. It's hypertext including * MarkUp.html -- the root node * Text.html -- an introduction to SGML syntax * html.dtd -- the spec expressed in HTML * several example files that form a validation suite * libHTML.tar -- some code that implements the low-level SGML reading state machine (with a test driver) Tim: please link this into the web somehow. Implementors: please grab the whole thing and validate your implementation against it. Tony: I've got some patches for the MidasWWW browser. I'm not quite done cleaning them up. Linemode fans: I haven't started messing with linemode yet. Issues Closed Pending Review: Long Names I included an SGML declaration that increases NAMELEN to 34, and LITLEN to 1024. I got these numbers from the DocBook DTD. SGML IDs for Anchor Names The NAME attribute of the A element is an ID. It must start with a name, and it must be unique among all the IDs in the document. [Note that there is no way to validate the #anchor part of the HREF attribute. I'm working on that...] Multimedia Links I included a content-type attribute for links so that you can tell the browser what type of data you're pointing to, and it can decide what to do with it (at a minimum, use this attribute and pass the data to metamail). I added a content-description attribute in case you want the reader to be able to get some information about the data without transfering it, but now I'm not sure it's a good idea. The description should go in the content of the A element. Formatted Text with Anchors I took the semantics of the PRE tag, added the WIDTH attribute, and called it TYPEWRITER (inspired by the nroff man page). It's parsed like most other elements, but displayed like XMP or LISTING or PLAINTEXT. Newline handling isn't a parsing issue -- it's a display issue. I think it will be more straightforward to define newlines in TYPEWRITER content to be significant. That way, once the data is parsed, XMP and TYPEWRITER work just the same. Lines may get real long. That's life. If you want to mail it, use MIME or uuencode or something. XMP and LISTING elements are CDATA: they have no markup in their content. There's no way to put </TITLE> inside an XMP element. PLAINTEXT is an empty element that signals the end of a text/html entity and begins a text/plain entity. Ordered Lists I included them in the DTD. Any objections? ISO Latin 1 Characters: I included a reference to "ISO 8879:1986//ENTITIES Added Latin 1//EN" in the HTML DTD. This defines entities for all ISO latin 1 characters. Clients will need a table of the names and local translations. Open Issues: Highlighting: Who's tags should we use? LaTeX seems to be an adequate markup system for lots of folks. Its tags are em | it | bf | sf | sl | tt The DocBook folks use only semantic tags: they don't have bold or italic tags. The MIME richtext stuff has only typographic tags: no <emphasis> or <booktitle> or any such thing. Dan