Re: <IMAGE>? <TT> == <I>? toHell(NS)

Scott E. Preece (preece@predator.urbana.mcd.mot.com)
Wed, 30 Oct 1996 09:07:10 -0600


Date: Wed, 30 Oct 1996 09:07:10 -0600
Message-Id: <199610301507.JAA28559@predator.urbana.mcd.mot.com>
From: "Scott E. Preece" <preece@predator.urbana.mcd.mot.com>
To: davidp@earthlink.net
CC: www-html@w3.org
In-reply-to: "David Perrell"'s message of Tue, 29 Oct 1996 12:06:57 -0800
Subject: Re: <IMAGE>? <TT> == <I>? toHell(NS)

 From: "David Perrell" <davidp@earthlink.net>
| 
| Scott E. Preece wrote:
| > Similarly, I believe your example in the second paragraph to be
| totally
| > broken - non-nesting tags simply aren't allowed, ever, and all the
| > browser can do is try to guess what you really meant.
| 
| Broken or no, the sequence is clear and guessing is unwarranted. I was
| pleased to find that IE rendered this -- IMO -- logically.
---

Well, you can say that all you like, but in fact the sequence is *not*
clear, it is inherently ambiguous and there is no "right" way to render
it.  You're saying "Oh, well, the author must have meant to have those
two tags intertwined and not nesting, so let's render it that way."  Not
only is this a guess (I can't say why you seem to think it isn't), but
any parser with any notion of SGML (and I've never before heard anyone
accuse Netscape of taking SGML parsing too strictly) is never going to
make that guess, because hierarchical nesting of markup is way too
essential a part of SGML to ignore.

You cannot get to your "logical" interpretation without totally ignoring
SGML parsing.

In another note you write...

| Carl Morris wrote:
| > ERROR... :)  I would hope that instead of guessing, logic would be
| > applied, and the simplest way out would be taken, since the <TT> is
| > inside the <I>, and the <I> has now been closed, logic says "there is
| > no way to leave the <TT> open...  but this would not be the first
| thing
| > that followed logic...
| 
| Logic says there is no way to leave TT open? Is it written in a DTD
| that "TT must look precisely like output from a teletypewriter, which
| has no italic, bold or underline capabilities"? Which monospaced font
| to use with TT is left to the browser; if that font is Courier and has
| an italic form, is it logical to make unneeded dictates about whether
| or not an author is *allowed* to specify it?
---

Again, when he says "there is no way to leave the <TT> open" he means
exactly that.  SGML does not allow for non-hierarchical markup.  It is
*impossible* to have an element start inside another element and end
outside it.  SGML simply does not allow you to represent that concept as
elements (you *could* represent it using a DTD that included elements
that signalled the beginning and end of regions, but the HTML DTD
doesn't do that - it wraps regions as elements).  Yes, this does make it
hard to represent certain real-world situations, including the one you
used in your example, but that doesn't change the fact that SGML simply
doesn't work that way, and HTML is SGML.

Most of your paragraph, however, seems off the point.  It *is* possible
to nest an I element inside a TT element or vice versa.  Morris's point
is just that the inner element must, logically, end at the end of the
containing element.  Section 5.7 of RFC 1866 specifically leaves
ambiguous the rendering when you nest one "phrase" element inside
another: the browser may apply both fonts or only the inner
one (that is, if you have <I>italic <TT>mono</TT></I>, the "italic" has
to be in italic, and the "mono" has to be monospaced, but the browser
*may* use an italic monospaced font for "mono", but is not required to
do so.

And in another note:

| I'm attempting to apply logic to the treatment of cases where
| the rules are broken and yet there is logic in the construct that
| breaks them. There is no logic in <I>italic text</B>, but there is in
| 
| > <TT>hello <I>good-bye</TT> maybe?</I>
---

My point is that while a human may be able to guess what the author
meant by that markup (because it is reasonably easy to imagine a markup
language in which that would be a legal expression), there is *no* SGML
logic in it.  It is not a valid logical expression in SGML.  A browser
presented with that markup must either guess or throw up its hands and
display an error message.  There are no other choices.

---
| if the text can be both monospaced and italic. Just because <TT>hello
| <I>good-bye</TT> maybe?</I> is invalid HTML, logic does not dictate
| that </TT> must always indicate the end of an italicized section. The
| 3.2 ref spec calls for start and end tags for all text and phrase
| markup.
---

No, </TT> does not always end an italicized phrase, only when the start
of the I markup was inside the TT phrase.  That is, if you have the
markup:  <I>This is italicized <TT>and this is monospaced</TT> and this
is still italicized</I>, the </TT> does not close the I element.
Hwever, the "and this is monospaced" may or may not be italicized as
well as monospaced, at the browser's discretion.

---
| > ...  I want the browser I use
| > to help me find errors...
| 
| Terminating one tag with a different end tag is bad guessing that is
| unlikely to help find errors. It was IE's logical (IMO) refusal to
| terminate <TT> with </I> that started this thread.
---

I want my authoring tools to help me find errors.  I want my reading
tools to faithfully present the markup they are given, but I don't want
to have to read error messages when the author has misused HTML.

scott

--
scott preece
motorola/mcg urbana design center	1101 e. university, urbana, il   61801
phone:	217-384-8589			  fax:	217-384-8550
internet mail:	preece@urbana.mcd.mot.com