- From: Joe Wells <jbw@cs.bu.edu>
- Date: Tue, 10 Oct 1995 22:40:50 -0400
- To: www-html@w3.org
Hi, HTML and SGML gurus, I've got some questions that can probably be answered by an expert without even thinking but which I haven't been able to find the answers to in my WWW browsing. Some of these questions are about HTML, some are about SGML, and some are about HTML as an SGML document type. Q: (("text/html" Internet Media Type)) Does text/html forbid including the SGML declaration (<!SGML ...>)? Does it require that a PUBLIC external identifier (i.e. PUBLIC "-//IETF//DTD HTML Level 2//EN") be included in the DTD, if the DTD is included? Does it forbid including a DTD subset? I am not asking how many WWW browsers can handle this; I am asking instead whether the standard specifies this. The version of the HTML 2.0 standard (which includes the definition of the text/html media type) that I read seemed vague on these questions, but perhaps I am missing something. Q: ((SGML Marked Sections)) The syntax for marked sections is not clear to me. I would like to know precisely how to determine when the end of a marked section has been reached. I've seen two grammars for this, one from TEI (which is clearly wrong and disagrees with what "sgmls" does) and one based on the standard which merely says the content of the marked section is "SGML characters" (which is not helpful). What is the precise syntax for marked sections? (Pointers to *net* resources are greatly preferred to paper resources. Pointers to source code should be to well-commented and clear source code; I've already tried to figure this out by reading the source of sgmls.) Q: ((HTML 3.0 with HTML.Recommended vs. Legacy Documents)) In HTML 3.0 with HTML.Recommended enabled in the DTD, it is illegal to put text directly inside an LI element, like this: <LI>Here are some words.</LI> This is legal: <LI><P>Here are some words.</P></LI> Is the first fragment supposed to be interpreted (rendered) like the second one by an HTML browser? I've noticed that many browsers (e.g. Netscape 1.1N) treat them very differently. Netscape in particular renders the second, legal version in a truly horrible fashion. There are other HTML elements with problems like the one I describe here for the LI element. Q: ((HTML 3.0 TEXTAREA vs. Inclusion Exceptions)) The HTML 3.0 proposed draft says that the content of the TEXTAREA element should be used as follows: "The text up to the end tag is used to initialize the field's value. The initialization text can contain SGML entities, e.g. for accented characters, but is otherwise treated as literal text." This presumes that the TEXTAREA element's content can only be data characters. However, using the proposed DTD the following HTML is valid: <FORM ACTION="http://dev.null.dom"> <P> <MATH> <TEXTAREA NAME="foo" ROWS=1 COLS=1> <SPOT ID="bar"> <BOX> yyy<SUP> zzz </SUP> </BOX> </TEXTAREA> </MATH> </P> </FORM> Thus, the TEXTAREA element can contain subelements. How should a browser handle this? In particular, what precisely should the browser send to the server if the user submits the form without changing anything in the text area? Q: ((SGML Unclosed Start and End Tags)) Under what circumstances are unclosed start and end tags allowed? Q: ((HTML 3.0 Dummy Elements)) In HTML 3.0, what is the purpose of having the BODYTEXT and FIGTEXT elements at all? They allow both start and end tags to be omitted and are not intended to be ever be used as markup. Neither of them seems to be documented in the proposed draft of the standard. Q: ((My SGML Confusion)) What is "SDATA"? Q: ((SGML vs. Carriage Returns)) The documentation for the program "sgmls" says that it does this: 1. each carriage return character is turned into a non-SGML character; 2. each newline character is turned into a record end character, and at the same time a record start character is inserted at the beginning of each line; Is this part of the standard? Is this an appropriate thing to do for unix compatibility because the convention on unix is that lines are not started by anything and are ended by newlines? Q: ((SGML Grammar Confusion)) The grammar of SGML that I have seen says one alternative for an "attribute value" is "character data". This seems very open-ended and unspecified. What does this mean? Thanks for any help you can give me. -- Joe Wells <jbw@cs.bu.edu>
Received on Tuesday, 10 October 1995 22:42:01 UTC