RE: HTML or XHTML - why do you use it? from Peter Foti (PeterF) on 2003-01-08 (www-html@w3.org from January 2003)

From: Peter Foti (PeterF) <PeterF@SystolicNetworks.com>
Date: Wed, 8 Jan 2003 12:38:17 -0500
To: "'Ian Hickson'" <ian@hixie.ch>
Cc: "'www-html@w3.org'" <www-html@w3.org>
Message-ID: <A10A983C9DFBD4119F0300104B2EA6B725FF40@ZIPPY>
> > Are there any browsers today that would reject non-valid 
> HTML 4.01?  
> > I don't believe there is.
> 
> I know of one (the validator).


True, but generally a user will not be browsing the web with the validator.
:)


> > > It's not -- XML doesn't have any content model which 
> allows comment-
> > > like markup to be ignored. Don't forget in XML parsers 
> should get the
> > > same result whether or not they parse the DTD (with a few 
> exceptions
> > > related to attributes and entities).
> > 
> > I see.  However, since we are talking about parsing XHTML 
> as HTML, I don't
> > think this matters because the agent will still treat it as 
> an HTML comment.
> 
> Eh?
> 
> The problem is that the following string:
> 
>    <script> <!--
>      work();
>    // --> </script>
> 
> ...will be treated differently depending on whether it is 
> supposed to be
> HTML or whether it is supposed to be XHTML.
> 
> So when a document containing the above has its MIME type changed from
> text/html to application/xhtml+xml, it'll break.


Once again, this revolves around users switching from text/html to
application/xhtml+xml.  Yes, you are correct, this will cause problems.  Of
course, if those pages are truly meant to be viewed as HTML then changing
the mime type would technically be the wrong thing to do. :)


> > But do any agents support the SHORTTAG feature?
> 
> Emacs/W3, I think. There was talk of implementing it in Mozilla, too.


Interesting.  I have used the Emacs editor before... never heard of an Emacs
web browser.  Nevertheless, someone else suggested removing SHORTTAG
functionality in the spec by way of the Errata.  I tend to agree with that,
and would like to see this happen since there are so few agents that
actually support it, and because it is an obvious shot in the foot regarding
XHTML.  Die SHORTTAG, Die!  :)


> > I'll take that as a compliment then. :)  But don't you 
> think the focus
> > should be on improving the quality of the existing developers rather
> > than to say "Existing developers are too stupid to us XHTML, so they
> > shouldn't?"
> 
> I think everyone should use XHTML. But ONLY if they use the
> application/xhtml+xml MIME type.


And if the page served as application/xhtml+xml contains HTML elements,
should the browser treat it as tag soup?  For example, if I send an XHTML
document that contains elements like <h1>,<p>,<strong>, etc., do I need to
define CSS styles for each of those objects, or will agents use their
default HTML styles?


> > > Why not just use HTML?
> > 
> > Because I want the benefits of using XML tools and 
> validators.  Not to
> > mention the experience of writing valid XML.
> 
> What about the benefits of SGML tools and validators, not to 
> mention the
> experience of writing valid SGML?


Well, considering that XML is the hot new technology, I see more benefit of
using XML than some other SGML language (my opinion).


> I agree, on the long run, XHTML-as-XML is better. On the short run,
> though, we're simply not there. (Largely because of the IEs.)


I agree.


> > > > If they did, then the XML tool would have to guess 
> where elements
> > > > ended if they re-opened the generated HTML file.
> > > 
> > > So why not use the SGML tools that have existed since 
> before XML was
> > > even an inkling in anyone's eye?
> > 
> > Because they are not as strict as XML tools and can produce 
> sloppy code?
> 
> No such thing as sloppy SGML code. It's either valid or it isn't.


By sloppy, I mean that this:

<sgmlcode><foo>Hello <bar>World</bar></foo></sgmlcode>

is easier to work with and read than something like this:

<sgmlcode><foo>Hello <bar>bar
or
<sgmlcode><foo>Hello <bar>bar</foo></bar></sgmlcode>
<sgmlcode /<foo /Hello <bar /World///

(Did I get that last one right?  :-)  


The strictness or XML provides a better structure to the documents *as far
as readability is concerned*.


> In fact, XML _introduced_ the idea of sloppy code (well formed but not
> valid markup is purely an XML concept).
> 
> 
> > > What other advantages are there?
> > 
> > Besides being able to use XML tools, it also gives authors 
> experience
> > writing *better* documents that are more structured.
> 
> There is a direct one to one mapping of canonical valid XML 
> documents to
> canonical valid SGML documents, so they can't be more structured.
> 
> 
> > I guess my argument is that developers should be trained to 
> use XHTML
> > *correctly*, and your argument seems to be that not enough 
> people use
> > XHTML correctly so therefore those people should not use it at all.
> 
> Who are you proposing do this training?


The same people who are currently teaching HTML (the Educational system).
Currently, a lot of schools will teach HTML instead of XHTML.  They should
update their focus.


> The only way I can see of training people to use XHTML is to 
> make the UAs
> _require_ it to be well formed. That is what I've been 
> personally working
> on making happen with, e.g., my QA work on Mozilla. However, 
> in the mean
> time, until we get decent support for XML in the market, 
> there's no-one
> doing the training.


Ah, yes, solid UAs will certainly help the cause.  But my point was geared
more towards the Educational system.


> > > If the document validates, there is no ambiguity about where the
> > > elements end. It is fully defined.
> > > 
> > > For example:
> > > 
> > >    <p>Test<ol><li></ol>
> > > 
> > > ...is _exactly_ equivalent to:
> > > 
> > >    <p>Test</p><ol><li></li></ol>
> > > 
> > > ...and all UAs support this correctly as far as my 
> testing has shown.
> > 
> > That would be nice... but Netscape 4 has proven you wrong. :) 
> 
> Netscape 4 gets perfectly well formed markup wrong as well, 
> so it really
> isn't a good example.


Ok, then how about this:
<p> My lousy paragraph <br> stinks!

Should that be interpretted as:
<p> My lousy paragraph </p><br> stinks!

or as:
<p> My lousy paragraph <br> stinks!</p>

Where in the spec is this ambiguity made clear?


> > > Basically, my argument is that if you know what you're doing, then
> > > sure, go ahead, but that most people don't, and that for 
> them it would
> > > be a lot easier if they used HTML 4.01 now and thus were 
> never tempted
> > > to convert these documents to an XML MIME type.
> > 
> > You don't think it would be better for those people to 
> simply learn XHTML?
> 
> Get real, who is going to teach them?


The same people who teach them the wrong way to do things. The Education
system (schools, colleges, etc.).


> It's like crime -- sure, I would rather teach everyone to be 
> nice to each
> other, but in the meantime, we still need car alarms.


I wish I had more time to devote to this conversation, but I will say that I
understand the point you are making.  I can't say that I necessarily agree
with it, but I certainly think each side of the coin has valid arguments.
Your primary argument seems to be that switching from text/html to
application/xhtml+xml will cause problems, so authors should not bother to
author documents in XHTML (unless they know what they are doing and produce
valid code, with the understanding that it would probably not display
correctly if served as application/xhtml+xml).  My primary argument is that
users developing pages intended to be served as text/html should write those
documents in XHTML so they can take advantage of the HTML handling of
existing agents, while moving one step closer to being able to produce XML
documents for the web.  I view my argument as more of a stepping stone to
something bigger down the road.  You've gotta crawl before you walk, and I
think it would be better to get developers to take some baby steps now vs.
hoping they will take giant leaps later on.

Ian, thanks for the debate.  :)  I'll try to keep up if I can.

Regards,
Peter Foti
Received on Wednesday, 8 January 2003 12:28:05 UTC