[whatwg] Allow trailing slash in always-empty HTML5 elements? from Sam Ruby on 2006-11-29 (public-whatwg-archive@w3.org from November 2006)

From: Sam Ruby <rubys@intertwingly.net>
Date: Wed, 29 Nov 2006 16:42:38 -0500
Message-ID: <3d4032300611291342h5b90ff78y275affc43d0c7a42@mail.gmail.com>
On 11/29/06, Ian Hickson <ian at hixie.ch> wrote:
>
> On Wed, 29 Nov 2006, Leons Petrazickis wrote:
> >
> > This rigmarole is going to repeat on every site that has converted to
> > XHTML sent as text/html. People are emotionally invested in the idea of
> > trailing slashes. Websites have complex codebases, and going through
> > them removing trailing slashes on singleton elements would be very hard.
>
> Various things are worse noting here:
>
> XHTML is a minority on the Web. Looking at just which elements specify the
> XHTML namespace on their <html> element, XHTML has at most 15%
> penetration, for example.


I am of the belief that that particular statistic is meaningless.  Even if
it were 15%, most aren't well formed.  Of those that are well formed, most
don't have the cojones to serve such documents with the appropriate MIME
type as they know that to do so would cause compliant UA to be rather
unforgiving.  And of the few insane enough to do so, it is rare that the
page in question is actually valid.

Nothing is going to stop people from continuing to use XHTML1, HTML4,
> HTML3.2, HTML2, or whatever their existing content uses. HTML5 is a new
> language, that happens to be backwards-compatible with all of those. There
> are probably near zero documents on the Web today that are
> HTML5-compliant, simply because the DOCTYPE is new. That's fine. Just
> getting new documents to be compliant would be fine. WordPress, for
> example, will eventually create new templates, and those could be based on
> HTML5 (though of course WordPress would have a harder job there due to its
> hardcoding of markup, but that's another story).


... on the other hand, I am not of the belief that version numbers mean what
they are supposed to.  You will see HTTP 1.1 headers in HTTP 1.0 requests,
RSS 2.0 elements in RSS 0.91 feeds, and HTML4 elements in XHTML documents.

We live in a cut and paste world.  The fact that I could find an XHTMLism in
the front page of Microsoft.com will likely surprise few.  Lachlan is free
to call the authors of WordPress bozos if he likes, but frankly the bozos
out number you.  What should be the most damning of all is that I found an
example on the most prominent page on the mozilla.org site.  No one can say
that the authors of that page didn't make a conscious choice in the DOCTYPE
for that page.  No one can say that the authors of that page are ignorant.
No one can say that mozilla has a(n entirely) cavalier attitude towards
standards.

My theory is that we live in a cut and paste world, one based on partial
understanding.  Few understand DOCTYPEs and xmlns attributes, mostly people
crib from something that works.

If people want to make HTML5 syntactically compatible with XHTML1, such
> that XHTML1 documents don't cause syntax errors in HTML5, we'll have to do
> a whole lot more than just allowing trailing /s. I don't really see why
> that would be a goal, though. Going further, if we want to make documents
> in general compliant with HTML5, then we've got our work cut out for us --
> at least 78% of documents are syntactically incorrect today (not counting
> things like trailing /s in attributes, or missing DOCTYPEs -- if you
> include those, the number is more like 93%).


At the present time being valid is an ideal that is virtually unattainable.
For most people, if your web page is broken, a validator is probably the
last place you want to go as it will require you to fix a number of things
that frankly nobody cares about before you can see the real errors.

The situation is not perfect, but perhaps a bit better for feeds.  For the
overwhelming majority of errors that the feed validator reports, there is
somebody that cares.  Example: try viewing a feed that isn't well formed
using IE7.

In general, people don't migrate to new versions of HTML. They only use
> new versions for new documents. Which is fine, since HTML5 UAs are going
> to be backwards-compatible (by design).


Now we are getting to the real question:  backwards compatible with what?
Only with compliant  documents (i.e., at most 22% of the web) or with pages
like the one at mozilla.org?

> They've already reaped all the benefits of XHTML -- cleaner, more
> > readable, more maintainable code.
>
> It's a myth than XHTML gives you those benefits, by the way, especially if
> you don't actually use an XML pipeline (which WordPress doesn't).


I have no interest in that discussion.

> The very idea of HTML5 is to not demand that the Web be scrapped and
> > rewritten. We need the people who have rewritten all their pages so that
> > they validate on the W3C validator -- they have the fire and the zeal
> > and the will to spread our format. We need to make the migration from
> > invalid XHTML to valid HTML5 very, very easy for them. We can't require
> > them to dig through PHP spaghetti. And that means that, no matter how
> > it's achieved, <br/> needs to be valid HTML5.
>
> I don't really understand this argument. Those who use XHTML1 because it's
> "the latest thing", are as likely to use HTML5 because it's "the latest
> thing", regardless of how complex that is. After all, they made the
> transition to XHTML, why wouldn't they make the transition to HTML5?


More likely, those that chose XHTML1 because it was the latest thing are now
jaded by the promises made - and largely unkept - and will take a pass on
HTML5.

Unless, of course, HTML5 compliance is simultaneously both more meaningful
and easier to achieve than XHTML1 compliance.

Drawing lines in the sand and maintaining that "<br />" is invalid is only
going to make more busy work for a lot of people.  If you try to explain why
this decision was made, most won't understand, and eventually most will
decide that compliance isn't worth the bother.

However, drawing lines in the sand that "<p /> doesn't mean what you think
it means" will affect few, and the reason for that particular line is both
sound and educational.


> I'm being devil's advocate here. As I noted earlier, I don't have an
> opinion on this yet; I'm interested in what people are saying.


I'm impressed that you are keeping an open mind.


> What would be most helpful is if it could be clearly stated what the
> proposal is exactly (trailing /s are already handled by the parser, is
> the proposal just to make them not raise an error in some cases? What
> cases, exactly? How would this change the parser spec?), what the reason
> for this proposal is, and what the pros and cons are.


Just FYI, I'm in no rush here.   What I said about living in a world where
mostly what exists out there consists of partial understanding applies to me
too.  Without running code and test cases, I don't yet fully 'grok' what the
parser described in the document is supposed to do.  But I will get there.


> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>

- Sam Ruby
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20061129/5734f40d/attachment.htm>
Received on Wednesday, 29 November 2006 13:42:38 UTC