Comments on HTML 3.0 Draft RFC

Ian Graham (igraham@utirc.utoronto.ca)
Tue, 11 Apr 1995 18:49:29 -0400 (EDT)


Message-Id: <9504112249.AA17027@www10.w3.org>
Subject: Comments on HTML 3.0 Draft RFC
To: www-html@www10.w3.org
Date: Tue, 11 Apr 1995 18:49:29 -0400 (EDT)
From: Ian Graham <igraham@utirc.utoronto.ca>

Hi --

Here are some comments I made after reading Dan Raggett's most recent
draft of the HTML 3.0 RFC (the plain text one, that is):

            <draft-ietf-html-specv3-00.txt>

dated 28 March 1995.  This is indeed a wonderful piece of work.  My
contributions to this effort are comparatively miniscule (even
if this letter appears to be long). Mostly I talk about some 
inconsistencies in the document, some suggestions for clarification 
(well, I wasn't sure, so probably others might be confused too) and 
finally some suggested changes in recommended usages.  I hope you 
find them useful, and not frivolous.


Sincerely,

Ian
--
Ian Graham .................................. igraham@utirc.utoronto.ca
Instructional and Research Computing
University of Toronto

-------------------------------------------------------------------

Attributes 							(Page 11)

 a) string literals
    ... states that string literals should replace characters that
    might be misinterpreted (e.g. ",' or >) by HTML character 
    references. This of course should *not* be done when the 
    string literal is a URL -- in this case the string literal
    should contain URL encodings of questionable characters. I
    believe this should be mentioned here.
    
    As far as I know this applies only to HREF and SRC attributes.
    What about ID and NAME?  Should fragment identifiers be URL 
    encoded? I am guessing not, as they are NAME tokens, but I
    just don't know, and don't recall anthing in the URL RFC about
    this (the draft I have is old....). Whichever, it might be a
    good idea to state which is the case here.

 b) name tokens
    Are name tokens case-sensitive?  I always thought they could be 
    were, but in practice browsers treat many attributes as
    case insensitive (and let us not forget mosaic, which does a bit
    of both....).  This has always been confusing, and not clearly
    spelled out in the RFC -- should this situation be clarified here?

Document Structure - the HEAD Element 				(Page 17)
   The HEAD element can be safely omitted only if the document 
   writer remembers to place the HEAD elements at the top of the 
   document. I've Certainly  seen many examples where this was 
   not done. IMO having HEAD and BODY tags helps to enforce 
   proper placment of head and body elements, and for this reason 
   I suggest that the RFC strongly recommend, or even require, the 
   use of HEAD and BODY tags in valid HTML 3.0 documents.

BR Element       						(Page 36)
   What is the recommended formatting for subsequent BR 
   elements? For example, should <BR><BR><BR> be treated as three 
   (line) breaks, or as a single break?  I prefer three line 
   breaks, as this seems to me more in keeping with the idea 
   of a <BR>.

P Element							(Page 33)
   The recommendations state that subsequent empty paragraphs 
   are discouraged (i.e. <P> <P> <P> ). Perhaps the RFC should
   recommend* browser behaviour for this case - I suggest 
   recommending that browsers ignore empty paragraphs.

   As an aside, I assume that "empty" means an element that 
   contains only whitespace - how do entities like &nbsp; fit 
   in this definition? Should the RFC formally define 
   text-"empty" elements ?

Horizontal Tab -- DP attribute  				(Page 39)
   The text says that the designated decimal point character 
   can be altered by the language content, as set by the lang 
   attribute on enclosing elements.  Is this the best choice --
   I would prefer that DP  override the existing default.
   For example, when writing a piece of text in a particular
   language with a given DP separator, I may very well want to
   override this separator for tab-aligned information, 
   for example:

      * a period . for scientific numbers (overriding the language
	specific separator)
      * perhaps a special symbol for other data, for example
	a h (used to separate hours/min in siderial notation --
	e.g. 12h30),  " for seconds, 
      * the dash for phone numbers....

   Also -- what happens if you specify align=decimal, but there 
   is no decimal in the text to be aligned?  Perhaps you should 
   be able to specify a default ordering in alignment, for 
   example:

	align=decimal,right

  which would align to the decimal symbol, and in the absence of
  a decimal, align to the right.

  This also applies to the DP attribute used in TABLE elements. 
  (see Page 83,86 and 89). 

Hypertext Links 						(Page 40)
  This reflects my ignorance of the formal definition of #PCDATA
  (part of %text's contents) but -- does this definition allow for
  anchor elements that contain no text (or a string of whitespace?,
  recall my confusion over the use of the phrase "empty" for this
  type of problem).  I don't think this should be allowed. Whatever,
  I think the RFC should explain this case.

  As far as I can tell there is nowhere in the RFC a definition of
  an "empty" text string.  For example, does an empty string consist
  of any combination of whitespace ASCII characters??? And how would
  this be generalized to other character sets? And what about
  &nbsp;?


Character Level Elements 					(Page 44)
  The RFC states "implementations are not required to render 
  these nested highlightings distinctly from non-nested elements".  
  Why this recommendation?

  At least for physical formatting tags the opposite seems more
  sensible -- things like <b><i>bla bla </i></b> so obviously
  suggests bold-face-italics, and many people already write
  with that expectation.  It seems to me reasonable, therefore, 
  to encourage that usage:
  (eg, "implementations are encouraged to render ,....").

  I note that later on (page 48) this is the recommended behaviour
  for physical tags.

  In this regard there appears to be a difference between informational
  and physical elements - information tags do not always logically 
  inherit the characteristics of surrounding informational tags.
  informational tags. Yikes, what a mess. Should this point be 
  discussed further in the RFC?

  Another question is - what to do with possible on-the-fly inclusion
  of text documents, where a block of text containing character
  formatting tags may be inserted between other such tags -- 
  should this case be handled differently? (I suppose not).


SAMP element							(Page 46)
  What is SAMP for?  the phrase "a sequence of literal characters"
  is only meaningful to those who've used texinfo. Perhaps a
  usage example for this element would be helpful.


IMG Element							(Page 51)
  Is there any interest in ALIGN=center (to align the image in
  the center of the page?) - this would be useful for inserting
  images as page decorations, flowed-around trademarks, etc. This
  is distinct from the image attribute to the HR element, since
  the IMG tag does not imply a separator.
 
  This would require attributes to control text flow around the 
  image - should text flow on the left only (the right is clear), 
  the right only (the left is clear), should it flow on both sides
  through the image, or on both sides as two columns on either side 
  of the image.  E.g: 

         ALIGN=center,leftonly    (text flow on left only)
         ALIGN=center,rightonly   (text flow on right only)
	 ALIGN=center,noflow      (no text surrounding image)
         ALIGN=center,flowthrough (see below)
         ALIGN=center,twocols     (see below)
   
   Here are examples of the two latter cases:

         0000000000000000000000000
                  ______
         1111111 |      | 22222222
         3333333 |      | 44444444      ALIGN=center,through
         5555555 |      | 66666666
                  ------
         7777777777777777777777777

	   	    or

         0000000000000000000000000
                  ______
         1111111 |      | 44444444
         2222222 |      | 55555555      ALIGN=center,twocols
         3333333 |      | 66666666
                  ------
         7777777777777777777777777


   How would this affect the CLEAR attribute?  I think not much...

   What about the Netscape HSPACE and VSPACE attributes?  I hate
   to admit it, but they do help enormously when floating images
   with surrounding text...


UL/OL Element -- SKIP attribute					(Page 58)
   How does the SKIP attribute affect sequence numbers for 
   unordered lists?  Do unordered lists even have sequence 
   numbers?


NEEDS Attribute							(Page 71)
   Reference to this obsoleted attribute appears on Pages 71 
   and 82.


FIG Element -- ALIGN attribute					(Page 72)
   This returns to the idea of centering with text flow around 
   the Figure.  Should we not allow centered figures with 
   surrounding text flow? For the IMG element I suggested the 
   attributes:

         ALIGN=center,leftonly    (text flow on left only)
         ALIGN=center,rightonly   (text flow on right only)
	 ALIGN=center,noflow      (no text surrounding image)
         ALIGN=center,flowthrough (see below)
         ALIGN=center,twocols     (see below)

   Some of the details of this should be left to the stylesheet.
   For example, centering need not be specifically the center
   of the page, and flow could be overridden by stylesheet
   preferences.


CAPTION Element 						(Page 75)
    Should we allow for justification, centering, etc. of the 
    caption within it's specified placement?  E.G.
	 ALIGN=top,center???
    Which would put the caption, centered,  at the top of the
    figure. Perhaps this is better left to the style sheet...

TABLE Element -- ALIGN Attribute				(Page 82)
   This returns to the idea of centering with text flow around a
   FIG.  Should we not allow centered tables with surrounding 
   text flow? 

         ALIGN=center,leftonly    (text flow on left only)
         ALIGN=center,rightonly   (text flow on right only)
	 ALIGN=center,noflow      (no text surrounding image)
         ALIGN=center,flowthrough (see below)
         ALIGN=center,twocols     (see below)

   Some of the details of this should be left to the stylesheet.
   For example, centering need not be specifically the center
   of the page, and flow could be overridden by stylesheet
   preferences.

MATH 
   Looks fantastic, and much too much for me.


PRE Element							(Page 113)
   Is it really necessary to cater to obsolete usages,  such
   as having <P>, <IMG> or <FIG> tags inside a PRE?  Would it 
   not be better to just state that these tags are not 
   permitted inside a PRE. Again, I am thinking of the fact 
   that many HTML authors use the RFC as a guide to writing, 
   and will be better warned away from inappropriate uasge by 
   stronger cautions -- something like
   
     <P>, <IMG> and <FIG> are not permitted inside a 
     PRE element.  

   would do it.   Also, if we use the .htm3 suffix to 
   denote HTML 3.0 documents as distinct from HTML2.0
   then we don't really  need to preserve, in the HTML 3.0
   RFC, all the legacy problems.

   I'm not really suggesting that browsers should not support
   these bad structures, but rather that the RFC should more
   strongly urge good design over bad.

   I vote for dumping the WIDTH attribute (Page 115).  This is
   something a browser can easily decide for itself.

   What about newline characters? I note that the generic algorithm
   described on page 144-145) won't  for text files created on a
   Macintosh, since it denotes newlines by CR.  What to do --
   Perhaps treat it as follows:

   a)If there are CRLF pair treat the pairs as 
	   col:= 0 row := row+1, and treat individual instances of the
			         characters as appropriate
   b) If there are only LF's alone treat them as
	   col:= 0 row := row+1
   c) If there are only CR's alone treat them as
	   col:= 0 row := row+1


FN (Footnotes)							(Page 118)
   The example uses footnotes that are part of the document 
   from which they are referenced -- I assume this is *not* the
   intention, and that FN's can be in any document?


FORMs								(Page 124)
   Someone suggested to me that it might be nice if a single
   FORM could have multiple ACTIONs, so that the form data
   could be sent to different URLs depending on which submit
   button was pressed. This would be useful, for example, if
   you wanted the choice of submitting the same data directly
   to a server or indirectly by mail.  This would also be useful
   if you wanted to give the user the choice of submitting data 
   to a secure or non-secure (e.g. RSA) server. It seemed like
   a reasonable idea to me -- what do you think?

INPUT TYPE=scribble  and INPUT TYPE=file 			(Page 129)
   How will these non-character type data be sent to the client?  
   As far as I know there is nothing in the x-www-url-encoded 
   MIME type that allows for multipart messages.  How should 
   the TYPE=scribble data be encoded for transmission?

INPUT -- ERROR Attribute					(Page 131)
   Currently the ERROR attribute would have to be a value 
   returned from the server.  Is this something that could 
   be overridden by client-side scripts?  If so, perhaps this 
   possiblity should be mentioned here.


Carriage Return/Line Feed					(Page 144)
   What should a browser do with documents created with
   text processors that do not use LF as part of the 
   newline string (Macintosh.......).  This could sure mess up
   creating TEXTAREA or PRE sections.


 --------- and that's all folks! ----------