W3C home > Mailing lists > Public > public-html@w3.org > July 2007

Re: Why HTML should be taught as HTML without pretending it is XML

From: Thomas Broyer <t.broyer@gmail.com>
Date: Fri, 20 Jul 2007 00:40:26 +0200
Message-ID: <a9699fd20707191540v1597beddo7e397d32c40434d5@mail.gmail.com>
To: public-html@w3.org

2007/7/19, Maurice Carey:
>
> Part of this recent surge of public interest in css and web standards is due
> to certain "web pro's" making a name for them self and actually becoming a
> little bit "famous" and then everyone followed their lead. Listening to what
> _they_ had to say about web standards and _then_ discovering and
> understanding the w3c. Visit some of these people's sites. Many are in xhmlt
> (although there are plenty of reasons to not be using xhtml) and the
> majority of them explicitly close _every_ tag on their pages.
> (I haven't actually checked myself but my gut feeling is that I'm right :)
>
> It just feels like the right way to do it to me. The "pro" way of doing it.
> The "don't cut corners you lazy bastard" way of doing it.

Technically speaking, I know people no longer really care about it
these days but...

I ran html5lib on the current draft (whatwg.org version), parsing then
re-serializing, with the following results:
 1. Outputting as XHTML w/ Appendix 4 (optimization: choosing between
' and " depending on the attribute value when it really needs quoting)
(these are the default options for XHTMLSerializer): 1 928 232 bytes
 2. Omitting optional tags, using minimized form for boolean
attributes and trying hard not to quote attribute values (these are
the default options for HTMLSerializer): 1 855 298 bytes

HTML5 syntax saves 72 934 bytes (around 3.75%).

Just FYI (those numbers are not really relevant here; the draft has
whole lots of comments):
--sanitize (briefly: displays style and script elements as text and
strips comments and some "unsafe" attributes): 1 711 854 bytes (gain:
11.2%)
--strip-whitespace (briefly: collapses spaces except in script, style
and pre, but it is buggy and strips a bit too much whitespace): 1 676
203 bytes (gain: 13%)
--sanitize --strip-whitespace: 1 531 491 bytes (gain: 20.5%)

-- 
Thomas Broyer
Received on Thursday, 19 July 2007 22:40:29 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:47 UTC