Re: HTML syntax from Thomas Broyer on 2007-11-29 (public-html@w3.org from November 2007)

From: Thomas Broyer <t.broyer@gmail.com>
Date: Thu, 29 Nov 2007 09:45:10 +0100
To: public-html@w3.org
Message-ID: <a9699fd20711290045jca7d3efs4520985fc321aa05@mail.gmail.com>
2007/11/29, Dean Edridge:
> Thomas Broyer wrote:
> >
> > So you'd want HTML documents to look like XHTML ones yet not being
> > "XML-wellformed", and with some differences on the parsing side,
> > leading to incompatibilities wrt scripting and styling (think TBODY,
> > TABLE/OL/UL inside P, etc.)
>
> The other differences between HTML and XHTML is not what I am talking
> about here. That can be addressed separately. The proposal is not to
> solve all the differences between the two serialisations, but to
> increase the usability/compatibility of HTML and XHTML in the future, as
> the two serialisations will both be used on the web in the future, just
> like they are today.
>
> I don't think it's wise or logical to ask schools and universities to
> teach students HTML like this:
>
>     <img id=logo src=logo.png alt=something>
>
>     <p class=intro>Readable Markup
>
> and when the next semester rolls around, and the students come back to
> learn XHTML or SVG etc, they'll need to learn a whole new syntax:
>
>     <img id="logo" src="logo.png" alt="something" />
>
>     <p class="intro">Readable Markup</p>

Because they are different "languages".

But allowing unquoted attribute values (i.e. they are said to be
"valid" by a conformance checker) doesn't mean HTML have to be teached
using them rather than single-quoted or double-quoted ones.

> Problems like this can easily be avoided by only allowing the stricter
> syntax in the spec like:
>
>     <img id="logo" src="logo.png" alt="something" />
>
>     <p class="intro">Readable Markup</p>

No, if it's a "problem" in your eyes, teach the above syntax for HTML
(yet say that it's one possible syntax, the one you prefer, because
it's similar to XML which you'll teach the next semester with XHTML
and SVG)

> > The primary goal of the "parsing" section of the spec is to define a
> > parser compatible with what browsers do today (this cannot be entirely
> > true given that browsers have incompatible behaviours in some cases),
> > so that a browser that implements this section can be used with HTML
> > 4, HTML3, tag soup, etc. pages found in the wild.
> > If you want to build a stricter parser, new browsers will have to
> > implement another "tag soup" parser and a switching mechanism between
> > its two parsers.
>
> Nonsense, and again, irrelevant, todays browsers already accept the
> stricter syntax that I have suggested. You have used Wordpress before
> haven't you?
> Are you suggesting that the average Wordpress Blog (parsed as text/html)
> is not supported by Firefox2, IE7 or Safari2.x ?
> More people use:
> <img id="logo" src="logo.png" alt="something" />
> than this:
> <img id=logo src=logo.png alt=something>
>
> So to say that a stricter syntax would require a new browser or parser
> is not true.

Restricting HTML5 to a stricter syntax and thus defining a parsing
algorithm for this syntax only would make the HTML5 parsing algorithm
incompatible with the Web as it is today. So someone willing to build
a new browsers would have to build an HTML5 parser *plus* a "tag-soup"
parser.
The goal of the "parsing" section of the spec is to have one parser
for "the Web", as it is today + with HTML5.

Nonetheless, given that most authors don't even try to validate their
code and just look at the result in one or two browsers (one reason
being that because they use "compatibility hacks", they know their
code won't validate), and given that browsers will continue to parse
unquoted attribute values, people using unquoted attribute values
today have no reason to change their mind in the future.

> > If you only want to set "best practices" and still have a parsing
> > algorithm like the one already in the HTML5 spec, be sure that most
> > people won't follow your syntax rules: if the parser accomodates with
> > unquoted attribute values, why should I bother quoting them? (among
> > other things).
>
> You don't seem to have read or understood the thread. The point of a
> stricter syntax is that it is interoperable with XML based languages.
> Which authors are going to need to use at some stage.

I'm not convinced, but yet I don't see your point. What do you mean by
"stricter" syntax?

1. Conformance checkers flagging unquoted attribute values (among
other things) as invalid?
That might happen, but how many people will care? and those who care
just won't use HTML5 and continue using HTML4 and validating against
an HTML4 parser.

2. Browsers showing error pages rather than recovering from "errors"?
That won't ever happen. Browser vendors want a clearly defined error
recovery algorithm. That's also one of the reason XHTML has failed on
the Web (the main reason still being that IE don't support it, but
there are content negotiation configuration tricks).

If you just want /> to be valid HTML5, it's already the case.


[OFFTOPIC]
> > I was heartedly promoting XHTML a few years ago, that's no longer the
> > case: I've learned to be more pragmatic.
>
> What is the relevance of this? Sorry, but I think you are still missing
> the point. This thread is not about HTML vs XHTML.
>
> If people look at this discussion with an open mind, and without the oh
> so common "I hate XHTML, let's deprecate it" attitude,

I don't hate XHTML, I think it has a real value. I just don't think it
will ever "compete" with HTML on the Web (again: on the Web).

The fact that I changed my mind about "XHTML vs. HTML" proves that I'm
more open-minded today than before.
[/OFFTOPIC]

> The (possible) beauty of this (X)HTML5 specification is that the world
> doesn't have to just use HTML, or just use XHTML, they can have the
> choice of both.
>
> Let's not take this choice away from them please.

But they'll always be differences between those serializations (how
they're parsed -> how the DOM will look like –e.g. there'll always be
a TBODY in HTML, whether there's a <TBODY> tag in the input document
or not– -> how scripts and CSS have to be written), and trying to make
them look the same don't seem to me to be the best choice: we really
should have people educated to the differences between XHTML and HTML.

XHTML is not harder to learn or use than HTML (everything's explicit
in XHTML: quoted attribute values, no optional tags, void elements
still have end tags –or start-end tags–; but on the other hand,
everything has to be lowercase), neither is true the opposite (HTML5
has clearly defined error recovery rules and you're not forced to
write some tags because they are optional; but on the other hand those
rules might look hard to learn).
That's just a matter of taste; and I believe it should remain "just a
matter of taste".

-- 
Thomas Broyer
Received on Thursday, 29 November 2007 08:45:18 UTC