- From: Ian Graham <igraham@utirc.utoronto.ca>
- Date: Tue, 11 Apr 1995 18:49:29 -0400 (EDT)
- To: www-html@www10.w3.org
Hi --
Here are some comments I made after reading Dan Raggett's most recent
draft of the HTML 3.0 RFC (the plain text one, that is):
<draft-ietf-html-specv3-00.txt>
dated 28 March 1995. This is indeed a wonderful piece of work. My
contributions to this effort are comparatively miniscule (even
if this letter appears to be long). Mostly I talk about some
inconsistencies in the document, some suggestions for clarification
(well, I wasn't sure, so probably others might be confused too) and
finally some suggested changes in recommended usages. I hope you
find them useful, and not frivolous.
Sincerely,
Ian
--
Ian Graham .................................. igraham@utirc.utoronto.ca
Instructional and Research Computing
University of Toronto
-------------------------------------------------------------------
Attributes (Page 11)
a) string literals
... states that string literals should replace characters that
might be misinterpreted (e.g. ",' or >) by HTML character
references. This of course should *not* be done when the
string literal is a URL -- in this case the string literal
should contain URL encodings of questionable characters. I
believe this should be mentioned here.
As far as I know this applies only to HREF and SRC attributes.
What about ID and NAME? Should fragment identifiers be URL
encoded? I am guessing not, as they are NAME tokens, but I
just don't know, and don't recall anthing in the URL RFC about
this (the draft I have is old....). Whichever, it might be a
good idea to state which is the case here.
b) name tokens
Are name tokens case-sensitive? I always thought they could be
were, but in practice browsers treat many attributes as
case insensitive (and let us not forget mosaic, which does a bit
of both....). This has always been confusing, and not clearly
spelled out in the RFC -- should this situation be clarified here?
Document Structure - the HEAD Element (Page 17)
The HEAD element can be safely omitted only if the document
writer remembers to place the HEAD elements at the top of the
document. I've Certainly seen many examples where this was
not done. IMO having HEAD and BODY tags helps to enforce
proper placment of head and body elements, and for this reason
I suggest that the RFC strongly recommend, or even require, the
use of HEAD and BODY tags in valid HTML 3.0 documents.
BR Element (Page 36)
What is the recommended formatting for subsequent BR
elements? For example, should <BR><BR><BR> be treated as three
(line) breaks, or as a single break? I prefer three line
breaks, as this seems to me more in keeping with the idea
of a <BR>.
P Element (Page 33)
The recommendations state that subsequent empty paragraphs
are discouraged (i.e. <P> <P> <P> ). Perhaps the RFC should
recommend* browser behaviour for this case - I suggest
recommending that browsers ignore empty paragraphs.
As an aside, I assume that "empty" means an element that
contains only whitespace - how do entities like fit
in this definition? Should the RFC formally define
text-"empty" elements ?
Horizontal Tab -- DP attribute (Page 39)
The text says that the designated decimal point character
can be altered by the language content, as set by the lang
attribute on enclosing elements. Is this the best choice --
I would prefer that DP override the existing default.
For example, when writing a piece of text in a particular
language with a given DP separator, I may very well want to
override this separator for tab-aligned information,
for example:
* a period . for scientific numbers (overriding the language
specific separator)
* perhaps a special symbol for other data, for example
a h (used to separate hours/min in siderial notation --
e.g. 12h30), " for seconds,
* the dash for phone numbers....
Also -- what happens if you specify align=decimal, but there
is no decimal in the text to be aligned? Perhaps you should
be able to specify a default ordering in alignment, for
example:
align=decimal,right
which would align to the decimal symbol, and in the absence of
a decimal, align to the right.
This also applies to the DP attribute used in TABLE elements.
(see Page 83,86 and 89).
Hypertext Links (Page 40)
This reflects my ignorance of the formal definition of #PCDATA
(part of %text's contents) but -- does this definition allow for
anchor elements that contain no text (or a string of whitespace?,
recall my confusion over the use of the phrase "empty" for this
type of problem). I don't think this should be allowed. Whatever,
I think the RFC should explain this case.
As far as I can tell there is nowhere in the RFC a definition of
an "empty" text string. For example, does an empty string consist
of any combination of whitespace ASCII characters??? And how would
this be generalized to other character sets? And what about
?
Character Level Elements (Page 44)
The RFC states "implementations are not required to render
these nested highlightings distinctly from non-nested elements".
Why this recommendation?
At least for physical formatting tags the opposite seems more
sensible -- things like <b><i>bla bla </i></b> so obviously
suggests bold-face-italics, and many people already write
with that expectation. It seems to me reasonable, therefore,
to encourage that usage:
(eg, "implementations are encouraged to render ,....").
I note that later on (page 48) this is the recommended behaviour
for physical tags.
In this regard there appears to be a difference between informational
and physical elements - information tags do not always logically
inherit the characteristics of surrounding informational tags.
informational tags. Yikes, what a mess. Should this point be
discussed further in the RFC?
Another question is - what to do with possible on-the-fly inclusion
of text documents, where a block of text containing character
formatting tags may be inserted between other such tags --
should this case be handled differently? (I suppose not).
SAMP element (Page 46)
What is SAMP for? the phrase "a sequence of literal characters"
is only meaningful to those who've used texinfo. Perhaps a
usage example for this element would be helpful.
IMG Element (Page 51)
Is there any interest in ALIGN=center (to align the image in
the center of the page?) - this would be useful for inserting
images as page decorations, flowed-around trademarks, etc. This
is distinct from the image attribute to the HR element, since
the IMG tag does not imply a separator.
This would require attributes to control text flow around the
image - should text flow on the left only (the right is clear),
the right only (the left is clear), should it flow on both sides
through the image, or on both sides as two columns on either side
of the image. E.g:
ALIGN=center,leftonly (text flow on left only)
ALIGN=center,rightonly (text flow on right only)
ALIGN=center,noflow (no text surrounding image)
ALIGN=center,flowthrough (see below)
ALIGN=center,twocols (see below)
Here are examples of the two latter cases:
0000000000000000000000000
______
1111111 | | 22222222
3333333 | | 44444444 ALIGN=center,through
5555555 | | 66666666
------
7777777777777777777777777
or
0000000000000000000000000
______
1111111 | | 44444444
2222222 | | 55555555 ALIGN=center,twocols
3333333 | | 66666666
------
7777777777777777777777777
How would this affect the CLEAR attribute? I think not much...
What about the Netscape HSPACE and VSPACE attributes? I hate
to admit it, but they do help enormously when floating images
with surrounding text...
UL/OL Element -- SKIP attribute (Page 58)
How does the SKIP attribute affect sequence numbers for
unordered lists? Do unordered lists even have sequence
numbers?
NEEDS Attribute (Page 71)
Reference to this obsoleted attribute appears on Pages 71
and 82.
FIG Element -- ALIGN attribute (Page 72)
This returns to the idea of centering with text flow around
the Figure. Should we not allow centered figures with
surrounding text flow? For the IMG element I suggested the
attributes:
ALIGN=center,leftonly (text flow on left only)
ALIGN=center,rightonly (text flow on right only)
ALIGN=center,noflow (no text surrounding image)
ALIGN=center,flowthrough (see below)
ALIGN=center,twocols (see below)
Some of the details of this should be left to the stylesheet.
For example, centering need not be specifically the center
of the page, and flow could be overridden by stylesheet
preferences.
CAPTION Element (Page 75)
Should we allow for justification, centering, etc. of the
caption within it's specified placement? E.G.
ALIGN=top,center???
Which would put the caption, centered, at the top of the
figure. Perhaps this is better left to the style sheet...
TABLE Element -- ALIGN Attribute (Page 82)
This returns to the idea of centering with text flow around a
FIG. Should we not allow centered tables with surrounding
text flow?
ALIGN=center,leftonly (text flow on left only)
ALIGN=center,rightonly (text flow on right only)
ALIGN=center,noflow (no text surrounding image)
ALIGN=center,flowthrough (see below)
ALIGN=center,twocols (see below)
Some of the details of this should be left to the stylesheet.
For example, centering need not be specifically the center
of the page, and flow could be overridden by stylesheet
preferences.
MATH
Looks fantastic, and much too much for me.
PRE Element (Page 113)
Is it really necessary to cater to obsolete usages, such
as having <P>, <IMG> or <FIG> tags inside a PRE? Would it
not be better to just state that these tags are not
permitted inside a PRE. Again, I am thinking of the fact
that many HTML authors use the RFC as a guide to writing,
and will be better warned away from inappropriate uasge by
stronger cautions -- something like
<P>, <IMG> and <FIG> are not permitted inside a
PRE element.
would do it. Also, if we use the .htm3 suffix to
denote HTML 3.0 documents as distinct from HTML2.0
then we don't really need to preserve, in the HTML 3.0
RFC, all the legacy problems.
I'm not really suggesting that browsers should not support
these bad structures, but rather that the RFC should more
strongly urge good design over bad.
I vote for dumping the WIDTH attribute (Page 115). This is
something a browser can easily decide for itself.
What about newline characters? I note that the generic algorithm
described on page 144-145) won't for text files created on a
Macintosh, since it denotes newlines by CR. What to do --
Perhaps treat it as follows:
a)If there are CRLF pair treat the pairs as
col:= 0 row := row+1, and treat individual instances of the
characters as appropriate
b) If there are only LF's alone treat them as
col:= 0 row := row+1
c) If there are only CR's alone treat them as
col:= 0 row := row+1
FN (Footnotes) (Page 118)
The example uses footnotes that are part of the document
from which they are referenced -- I assume this is *not* the
intention, and that FN's can be in any document?
FORMs (Page 124)
Someone suggested to me that it might be nice if a single
FORM could have multiple ACTIONs, so that the form data
could be sent to different URLs depending on which submit
button was pressed. This would be useful, for example, if
you wanted the choice of submitting the same data directly
to a server or indirectly by mail. This would also be useful
if you wanted to give the user the choice of submitting data
to a secure or non-secure (e.g. RSA) server. It seemed like
a reasonable idea to me -- what do you think?
INPUT TYPE=scribble and INPUT TYPE=file (Page 129)
How will these non-character type data be sent to the client?
As far as I know there is nothing in the x-www-url-encoded
MIME type that allows for multipart messages. How should
the TYPE=scribble data be encoded for transmission?
INPUT -- ERROR Attribute (Page 131)
Currently the ERROR attribute would have to be a value
returned from the server. Is this something that could
be overridden by client-side scripts? If so, perhaps this
possiblity should be mentioned here.
Carriage Return/Line Feed (Page 144)
What should a browser do with documents created with
text processors that do not use LF as part of the
newline string (Macintosh.......). This could sure mess up
creating TEXTAREA or PRE sections.
--------- and that's all folks! ----------
Received on Tuesday, 11 April 1995 18:49:48 UTC