- From: Sunil Mishra <smishra@cc.gatech.edu>
- Date: Thu, 19 Sep 1996 17:30:25 -0400 (EDT)
- To: www-html@w3.org
\\ A question came up at my site about whether white space is acceptable \\ in tags, and I was unable to figure out from the stuff I could find at \\ the W3.org web site whether this is valid or not. \\ \\ It's extremely unfortunate that HTML is based on a proprietary spec \\ that we can't distribute online. I hope W3C is trying to remedy this \\ situation. How much money would it take to pry loose the SGML spec \\ from ISO for public distribution without restriction? I can attempt \\ to provide or raise this money, if they have a price. If they refuse \\ to permit public use at any price, I think the HTML community should \\ duplicate the work (to the extent that we need it) and separate from \\ the SGML community. I believe SGML is an ISO standard, and there is nothing proprietary about it. I have found more information about correctly parsing SGML out there than I could handle, so much so that I had to give up on them and fall back on the flex spec at w3c while writing a parser. \\ I tried reading the HTML lexical analyzer to answer the question, but \\ it uses features of flex that I've never seen before and don't \\ understand. \\ \\ Here's the specific issue: \\ \\ When doing HTML anchors (links), the closing ">" on the <A HREF...> \\ element needs to be in contact with the rest of it: \\ \\ <A HREF="/pub/join/index.html">Join EFF today</A>! \\ \\ not: \\ \\ <A HREF="/pub/join/index.html" \\ >Join EFF today</A>! \\ \\ Netscape is smart enough to parse the 2nd example, but many other \\ browsers aren't. The way it works is that < cannot be followed by whitespace. Other than that there are no restrictions. Any parser that can't handle this is, well, broken. \\ I think this is incorrect; I hope the spec allows arbitrary white-space \\ inside the < ... > delimiters. But, it's sad but true, I can't find \\ a spec for this. \\ \\ Besides answering the question, can someone on this list put the \\ answer where other people can find it? It would be nice if a \\ human-readable and definitive lexical standard for HTML was available, \\ and w3.org seems like a good place to put it. The definitive HTML parser would be good to have. It's not entirely clear what SGML constructs are valid HTML, and what are not, implemented and otherwise. The lexer is easy enough to understand, but the information from the lexer then has to be fed into the parser, which is not at all documented. This is what I would like to know more about. The w3c has standard libraries available that make this task somewhat easier, but when parsing HTML for less standard tasks (and languages) a library is simply not enough. Sunil
Received on Thursday, 19 September 1996 17:30:43 UTC