Message-Id: <9212041911.AA23347@pixel.convex.com> To: www-talk@nxoc01.cern.ch Subject: The spec evolves... Date: Fri, 04 Dec 92 13:11:32 CST From: Dan Connolly <connolly@pixel.convex.com> I just uploaded to ftp://info.cern.ch/pub/incoming: libHTML-921202.tar.Z -- HTML parsing library with demo program. (includes current DTD) html_spec-921202.tar.Z -- HTML.html and related files, moving toward a spec. I've made some significant changes to the DTD. 1. I put SHORTTAG NO in the SGML declaraion. This means a) _all_ attributes have to be quoted (numbers, names, ids, CDATA -- everything), but b) it makes parsing cleaner: minimization isn't allowed. The NET feature is disabled (that's for doing <bold/foo bar/ in stead of <bold>foo bar</bold> tricky to parse.) 2. I figured out a way to support HEAD/BODY tags without breaking things. We lose some structure, but we gain some too. And this time I stuck by the mixed content rule-of-thumb. 3. I got rid of the TYPE attribute on anchor tags. What's that thing for anyway. Does anybody use it? 4. I changed TYPEWRITER to PRE. My new motto is: just describe it; don't prescribe it. 4. I added a TEXT attribute to anchor tags. The idea is that all <A HREF=...> point to MIME text/* objects. The TEXT parameter tells you the subtype, so you don't have to zen it from the filename (or so you can override the filename.) For example: <A HREF="TheProject.html" TEXT=PLAIN>This is a link to that file treated as a plain text file.</A> <A HREF="abcdef" TEXT=HTML>abcdef is an HTML entity, even though it doesn't have and extension.</A> It's a little prescriptive, but those semantics are mostly implemented already. 5. I changed anchor names from SGML Id's to NMTOKENS, so you can use numbers or whatever you want. Since we don't have any IDREFs pointing to them, there's no reason to use ID's. In other words, I've moved this feature into the realm of application conventions rather than SGML features. 6. I changed XMP and LISTING back to RCDATA. I was messing with the MidasWWW browser, and I couln't figure out how, when I'm dumping the SGML out of the data structures into a file, to tell whether I should change '<'s to "<" or not. If we avoid CDATA, we can use entities everywhere, and processing is simpler. How's that sound? Now some ideas to kick around... * Somebody mentioned a <VAR>...</VAR> tag for stuff that shouldn't be cached. I'm thinking it should be a node-wide empty tag, like ISINDEX. Maybe <VOLATILE> is a good word. Then I think to myself, why not make it an attribute on the HEAD element: <!ATTLIST HEAD STATUS (OK, ERROR, VOLATILE) OK> More later... Dan