assorted HTML and SGML questions

Hi, HTML and SGML gurus,

I've got some questions that can probably be answered by an expert without
even thinking but which I haven't been able to find the answers to in my
WWW browsing.  Some of these questions are about HTML, some are about
SGML, and some are about HTML as an SGML document type.

Q: (("text/html" Internet Media Type)) Does text/html forbid including the
   SGML declaration (<!SGML ...>)?  Does it require that a PUBLIC external
   identifier (i.e. PUBLIC "-//IETF//DTD HTML Level 2//EN") be included in
   the DTD, if the DTD is included?  Does it forbid including a DTD
   subset?  I am not asking how many WWW browsers can handle this; I am
   asking instead whether the standard specifies this.  The version of the
   HTML 2.0 standard (which includes the definition of the text/html media
   type) that I read seemed vague on these questions, but perhaps I am
   missing something.
   
Q: ((SGML Marked Sections)) The syntax for marked sections is not clear to
   me.  I would like to know precisely how to determine when the end of a
   marked section has been reached.  I've seen two grammars for this, one
   from TEI (which is clearly wrong and disagrees with what "sgmls" does)
   and one based on the standard which merely says the content of the
   marked section is "SGML characters" (which is not helpful).  What is
   the precise syntax for marked sections?  (Pointers to *net* resources
   are greatly preferred to paper resources.  Pointers to source code
   should be to well-commented and clear source code; I've already tried
   to figure this out by reading the source of sgmls.)

Q: ((HTML 3.0 with HTML.Recommended vs. Legacy Documents)) In HTML 3.0
   with HTML.Recommended enabled in the DTD, it is illegal to put text
   directly inside an LI element, like this:

     <LI>Here are some words.</LI>

   This is legal:

     <LI><P>Here are some words.</P></LI>

   Is the first fragment supposed to be interpreted (rendered) like the
   second one by an HTML browser?  I've noticed that many browsers
   (e.g. Netscape 1.1N) treat them very differently.  Netscape in
   particular renders the second, legal version in a truly horrible
   fashion.  There are other HTML elements with problems like the one I
   describe here for the LI element.

Q: ((HTML 3.0 TEXTAREA vs. Inclusion Exceptions)) The HTML 3.0 proposed
   draft says that the content of the TEXTAREA element should be used as
   follows:

     "The text up to the end tag is used to initialize the field's value.
     The initialization text can contain SGML entities, e.g. for accented
     characters, but is otherwise treated as literal text."

   This presumes that the TEXTAREA element's content can only be data
   characters.  However, using the proposed DTD the following HTML is valid:

     <FORM ACTION="http://dev.null.dom">
       <P>
         <MATH>
           <TEXTAREA NAME="foo" ROWS=1 COLS=1>
             <SPOT ID="bar">
             <BOX>
               yyy<SUP>
                    zzz
                  </SUP>
             </BOX>
           </TEXTAREA>
         </MATH>
       </P>
     </FORM>

   Thus, the TEXTAREA element can contain subelements.  How should a
   browser handle this?  In particular, what precisely should the browser
   send to the server if the user submits the form without changing
   anything in the text area?

Q: ((SGML Unclosed Start and End Tags)) Under what circumstances are
   unclosed start and end tags allowed?

Q: ((HTML 3.0 Dummy Elements)) In HTML 3.0, what is the purpose of having
   the BODYTEXT and FIGTEXT elements at all?  They allow both start and
   end tags to be omitted and are not intended to be ever be used as
   markup.  Neither of them seems to be documented in the proposed draft
   of the standard.

Q: ((My SGML Confusion)) What is "SDATA"?

Q: ((SGML vs. Carriage Returns)) The documentation for the program "sgmls"
   says that it does this:
   
       1.     each carriage return character  is  turned  into  a
              non-SGML character;

       2.     each  newline character is turned into a record end
              character, and at the  same  time  a  record  start
              character  is  inserted  at  the  beginning of each
              line;

   Is this part of the standard?  Is this an appropriate thing to do for
   unix compatibility because the convention on unix is that lines are not
   started by anything and are ended by newlines?

Q: ((SGML Grammar Confusion)) The grammar of SGML that I have seen says
   one alternative for an "attribute value" is "character data".  This
   seems very open-ended and unspecified.  What does this mean?
   
Thanks for any help you can give me.

-- 
Joe Wells <jbw@cs.bu.edu>

Received on Tuesday, 10 October 1995 22:42:01 UTC