W3C home > Mailing lists > Public > www-validator@w3.org > September 2002

Re: BASE and IMG and [X]HTML

From: <kynn@idyllmtn.com>
Date: Mon, 23 Sep 2002 14:01:20 -0700 (PDT)
Message-Id: <200209232101.OAA08885@garth.idyllmtn.com>
To: creiman@kefta.com (Charlie Reiman)
Cc: www-validator@w3.org

Charlie Reiman asked:
> We've been having a discussion on the zope mailing list regarding the
> validator's behavior with <base ... />. In particular, an HTML 4.01
> transitional document is not allowed to use <base ... />. Instead, it is
> expected to use <base ...>.
> 
> Well, okay. I don't like it but I accept the reasoning. But why does it not
> complain about <img ... />? Isn't this the same situation?

Hi, Charlie, it goes like this.  You and I both know that HTML 4.01 is
html written according to SGML rules, and XHTML 1.0 is html written to
XML rules.

In the SGML-HtML rules, the closing > on a tag is actually optional in many
cases, and when it's not there, the assumption is that it's meant to be
there, and anything else is just character data.

Why is that important?  Well, it makes sense when you combine it with
someone else -- a slash / can't appear inside the tag in SGML-HTML.

So when an SGML-HTML application (such as the validator) sees the 
following:

     <img src="blah.jpg" alt="Blah!" />

It reads it as:

<
     Okay, the start of a tag.
img
     Aha, this is the image element
src="blah.jpg"
     Okay, this is an attribute
alt="Blah!"
     This is another attribute
/
     Wait, what the heck is this?  This can't appear inside this tag.
     Oh, I get it.  The tag actually closed after the last valid
     attribute, they just didn't include the >.  Okay, so the / is
     some character text data after the tag.
>
     Hmm, I guess this is still character text data.

So really it reads it as:

     <img src="blah.jpg" alt="Blah!">/&gt;

Now, the browsers out there aren't really SGML applications.  So they
don't follow the SGML rules properly, and won't read it that way.
Instead they'll read it as:

     <img src="blah.jpg" alt="Blah!" [SOMETHING I DON'T KNOW SO I WILL
       IGNORE]>

...which means it will display as you'd expect.

Okay, so what's the problem with <base /> not being allowed but
<img /> is?

Simple:  The HTML specs don't allow "raw" character data text to 
         appear inside the <head>, but they do allow it to appear
         inside the <body>.

When you write <img />, that extra /&gt; -- as the validator reads it --
follows the <img> and is within the <body> text, where character text
is perfectly valid.  When you write <base />, the /&gt; appears in
the <head> element, and that's NOT allowed, so the browser throws an
error atcha.

--Kynn
Received on Monday, 23 September 2002 17:00:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:04 GMT