Re: An HTML language specification vs. a browser specification

On Nov 18, 2008, at 2:10 AM, Ian Hickson wrote:
> On Mon, 17 Nov 2008, Roy T. Fielding wrote:
>> On Nov 17, 2008, at 4:45 PM, Ian Hickson wrote:
>>> Your e-mail seemed to imply a set of fundamental assumptions that  
>>> I am
>>> not sure we share. In order to help get a better common  
>>> understanding,
>>> I'd like to see if you can explain to me whether I am correct  
>>> that you
>>> have those assumptions and if so, why you hold them to be true.
>>> 1. Browsers, in particular HTML parsers in browsers, change with
>>>    regularity in ways that change the rendering of existing pages.
>> I made no such claim.
> Ok.
> Could you elaborate on what you meant by the following, then?:
> | In order for me to design my content for testing on current  
> browsers,
> | I'd have to regenerate it every six months (more frequently  
> during the
> | cycles when competition between browser vendors is relevant).
>  --

The definition of what is a "current browser" changes every six
months.  So, in order for me to design my content by testing in a
current browser, I would have to redesign my content every six months.
Obviously, I don't do that.  I design my content using tools that
have been developed based on specifications and experience over
the past 15 years.  I have the luxury of being able to fix those
tools when they do something that violates the specifications.
Most authors just use the tools that are given to them.

>>> 2. The vast majority of "non-program" content was written long  
>>> before
>>>    MSIE6 existed (2001).
>>> My understanding is that content on the Web has been growing  
>>> exponentially
>>> year over year, which makes this assumption seem implausible.  
>>> Could you
>>> provide us with data that backs up your assertion?
>>> (Data that backs up the opposite assertion would be, for example,  
>>> that
>>> search engines around 2000 knew of about a billion pages, whereas  
>>> search
>>> engines today know of about a trillion pages, suggesting that the  
>>> majority
>>> of content is newer than 2000. [1])
>>> [1]
>> Written programmatically includes everything rendered by blogging
>> software, content management systems, Google, Yahoo, Facebook,
>> YouTube, Word's "save as HTML", and regurgitated by sites that
>> transclude other sites.
>> The billion or so pages of information growth is almost entirely
>> placed into HTML form by authoring tools that do not give authors
>> control over the HTML form.  Those tools are developed based on
>> language specifications and generic templates, not by "designing
>> for current browsers" which didn't exist at the time and not by
>> "testing what works in current browsers" that have a lifetime far
>> shorter than the information produced by authoring tools.
> To clarify, is it therefore the case that you subscribe to the  
> following
> statements?:
> 2a. What works in contemporary browsers at a time t1 is not  
> necessarily a
>     subset of what works in browsers at a significantly later time t2.

Of course not.  There are plenty of things that worked in one browser
four years ago that were removed from (most) browsers because of  
concerns.  I am hoping for more of the same over time.

> 2b. Authoring tools and templates are written according to the  
> standards
>     and are not tested against browsers contemporary to their  
> creation.

They are written according to working standards and tested against
the most popular browsers contemporary to their creation.

>>> 3. Authors when writing Web pages do not attempt to make their  
>>> pages look
>>>    like they want in the browser they use.
>>> Based on the feedback one sees in authoring community discussion
>>> groups, it appears that authors do in fact check that their new
>>> content renders as they desire in contemporary browsers. If you
>>> disagree, could you demonstrate why you believe this is not the  
>>> case?
>> Of course they do.  They also think that adding an entry using  
>> Wordpress
>> is authoring HTML.  What's your point?
> I have no point here, I'm trying to understand where you are coming  
> from.
> So to clarify, you disagree with the above statement that authors  
> do not
> attempt to make their pages look like what they want in the browser  
> they
> use? That is, do you subscribe to the following?:

When did you stop beating your wife, Ian?  Stop rephrasing questions  
have nothing to do with my comments and just deal with my comments.

> 3'. Authors when writing Web pages attempt to make their pages look  
> like
>     they want in the browser they use.

What authors are you talking about here?  The ones that put words in a
readable order, the ones that build tools that translate from paragraphs
to HTML documents, or the ones that design templates for paragraphs to
be inserted within?

When I write HTML content for the Web, I use one of:

  o a CMS with a paragraph-oriented editing tool that produces a safe
    HTML subset;
  o keynote/pages or Word with save as HTML and post-processing w/vim;
  o browser-embedded editor that sends paragraphs to a blog;
  o browser-based form that uses wiki syntax for server-side conversion;
  o browser-based form that uses plain-text with character elements
    for server-side conversion;
  o an email client that sends plain text to a server-side conversion;
  o textmate of XML documents that will be translated to HTML via XSLT;
  o vim of an old HTML document.

I then use my current browser(s) to check the rendered HTML for errors
(to the extent that my browser will reveal them) and fix the tools if
there are any visible.  But I am not a normal author and I don't create
the vast majority of content on the Web, which was the premise being
used in this discussion of where HTML content comes from.

Normal authors use the tools that we create.  Most of those tools
do not, in fact, have a DOM or anything like a browser engine engaged
in the authoring of HTML.

>>> 4. A browser that doesn't implement the APIs, vocabulary, and error
>>>    handling that major browsers implement could effectively  
>>> compete in the
>>>    marketspace.
>>> Could you provide an example of a competitive browser that doesn't
>>> implement, as you put it, "all that crap"?
>> No.  The Web would be better off without that crap, but I have no
>> objection to you putting all that crap into a browser spec if you  
>> think
>> all browsers need to implement it.
> So do you believe the following statement?:
> 4'. A browser, to effectively compete in the marketspace, has to  
> implement
>     the APIs, vocabulary, and error handling that major browsers
>     implement.
> That is, do you believe in statement 4, or statement 4'?

I don't believe either statement is relevant.  I am not interested in
creating a browser that competes in the market, nor are the vast  
of implementations of HTML that would need to conform to a new HTML
standard.  That is why I said the title is important.

>> That is in contrast to HTML, the language, which is something that my
>> software does generate and needs to remain compliant with, and  
>> thus it
>> does cause a great deal of harm for you to add a bunch of procedural
>> nonsense to the declarative language definition.
> I don't understand. Could you elaborate on how DOM APIs cause harm?

They make it impossible to understand the language syntax and semantics
without plugging in a browsing engine and observing its operation over
time.  They are not declarative definitions.

>>> 5. A specification that defines how to implement a Web browser would
>>>    remove competition in the browser space.
>>> Reports from browser vendors suggest that a considerable amount  
>>> of time is
>>> spent reverse-engineering other browsers in order to be  
>>> competitive. HTML5
>>> attempts to reduce this by doing all this work for them, thus  
>>> reducing the
>>> amount of work that it would take to make a competitive browser.
>>> Why do you think that defining these features in detail reduces the
>>> ability for new competitors to enter the market?
>> Because defining error behavior as the standard makes it very
>> difficult for applications that are error-free to be approved
>> for use within the environments that require adherence to standards
>> (including the stupid ones).
> I don't understand. Could you provide an example?

See FIPS and WCAG.

>>> 6. Most people don't want a specification that covers the  
>>> features that
>>>    HTML5 covers.
>>> I understand that you might not want it, but what evidence do you  
>>> have
>>> that the majority of the Web standards community doesn't want it?
>> Because not a single expert in the Web standards community that I  
>> have
>> talked to in the past two years has supported the current work in  
>> HTML5.
>> The single most common reaction to the features that you have wedged
>> into HTML5 is abject laughter and disdain for this process.
> Hm, this is in stark contrast to the feedback I have received (from
> literally hundreds of people).
> It is obviously of critical importance to me that HTML5 addresses the
> needs of the wide Web community. Clearly, we have received different
> feedback from different parts of this community. I would like to  
> receive
> feedback from the the people to which you have been talking. Would  
> it be
> possible for you to point me in the right direction to obtain this
> feedback? Are there mailing lists where it would be appropriate to  
> request
> constructive feedback from these people?

Why?  You don't deal with the constructive feedback received from me.
Why should I subject this process to our customers and my friends?
They've given up on this process.

> Do you have any suggestions for how we could obtain a representative
> sample of people to determine once and for all what fraction of  
> experts in
> the Web standards community are in favour of the current direction of
> HTML5 and what fraction are opposed to it?

Yes.  Stop treating the ideas that sprang out of the WHATWG as
proven by their very existence.  I don't care how long ping has been
under consideration by WHATWG mailing lists, nor do I care how many
fanboys have thought in the past that it is worth implementing.  It
represents a change to HTML (a harmful one at that).  Place it on
the block and let it fight for itself in terms of implementation.
It should be a separate proposal until it has been successfully
implemented by two independent implementations.  Likewise for all
of the other new additions.

>>> (There is counter-evidence, for example the size of the HTML and
>>> WHATWG working groups and the level of support that HTML5 has had in
>>> working group votes.)
>> You aren't listening to the objections.
> I'm trying, but you're not making it easy. For example, in this e-mail
> alone you have referred to parts of HTML5 as "crap", "nonsense",  
> and have
> said that these features receive nothing but "abject laughter" and
> "disdain" and that they "trash" previous HTML work. You have not  
> made a
> single constructive statement in this entire thread as far as I can  
> tell.

That's because all of my previous constructive criticism has been
ignored by the editor of this specification based on his own theory
that the Web architecture is academic and shouldn't apply to browsers,
which are apparently all that matter when it comes to HTML.

> I have feedback saying that HTML5 should have these features,  
> giving use
> cases, reasons, technical arguments, supporting data, etc. Your  
> feedback
> consists of "you should remove these features because they are  
> crap". I
> hope it is clear that from a purely logical point of view, your  
> arguments
> hold less weight when put forward in this manner.

There are no use cases, reasons, technical arguments, supporting data,
or any other form of logical thought that supports the addition of
ping if it is viewed with even the slightest understanding of how the
Web measurement community (and particularly the referral tracking
companies) work.  I've already explained that numerous times.  That
doesn't seem to bother you at all.

Likewise, this is a discussion about whether HTML should be defined
using a declarative language specification or not.  I think it must
be defined as a declarative language specification because that is
what my tools need in order to understand and implement HTML.  None
of my tools have a DOM.  Nor do any of the tools developed by our
competitors in the same marketplace.  We outnumber the browser
manufacturers 100 to 1.

>> Nothing I have said here hasn't been repeated several times  
>> already and
>> reinforced by a dozen others, and yet you have not made a single  
>> change
>> to the document that represents those WG opinions.
> I have had the opposite feedback from hundreds of other people, far  
> more
> than a dozen. Why should I ignore the majority in favour of the  
> minority,
> especially when the majority has a more technically sound and more
> logically argued position?

Because I have more experience with the Web and its protocols and
the wide variety of implementations than just about anyone else you
will talk to, and I can tell the difference between a technically
sound argument and wishful thinking.  I don't have as much time
available to spend on HTML5 as others and I don't have access to
a magical pot of statistics that let me claim things about spider
traces as if they were relevant to actual decisions, but I do have
a real job that actually depends on a Web that works according to
real software engineering principles and an extensive set of
implementations that shows I am not talking out of my ass.

I am sure the wave of fanboys will start crying about my use
of argument by authority, but quite frankly I don't care any more.
Let them demonstrate by deploying implementations, not opinions.

>>> 7. Only browsers need to deal with error handling in parsing.
>>> Why do you think that, for example, search engines, validators,
>>> authoring tools, data mining tools, and so forth, would benefit from
>>> _not_ handling errors in HTML documents in the same way as browsers
>>> do?
>> They all handle errors in different ways.
> But why is this a good thing?

Because they are different contexts.  The fact that I want my
authoring tool to spellcheck my content does not imply that I want
all browsers to display squiggles under every word not found in
their own dictionaries.  The fact that I want my browser to check
for errors in content-type charset encoding and display the text
anyway does not imply that I want my XSLT tool to do the same.

>> It would be utterly stupid for an authoring tool to generate
>> error-filled HTML just because the browser spec says that it must
>> auto-adjust the DOM (which it doesn't even have) rather than  
>> simply fix
>> the HTML or print an error.
> Would it be correct to say that you assume the following?:
> 7a. The HTML5 spec requires authoring tools to generate error- 
> filled HTML
>     and not correct markup errors.
> 7b. The HTML5 spec requires authoring tools to have a DOM.

The HTML5 spec can't be understood without an implemented DOM.

>> Error handling in entirely dependent on context.
> So would it be correct to say then that you believe that:
> 7'. Search engines and data mining tools should use different error
>     handling rules than browsers.

No, but again that isn't relevant to the vast majority of

... I have no time for this endless stream of pointless interrogatives,
none of which address the objection raised by me that led to this
thread on this mailing list.  If you would just address the issues,
we might actually be able to resolve something.

> Obviously, there are parts of the spec that are specific to  
> browsers, just
> like there are parts specific to data mining tools, search engines,
> validators, authors, etc. Is that all you are referring to?

Yes.  The parts of the spec that are only relevant to specific
implementations belong in specifications about those implementations,
not in the specification of HTML.  Moreover, the specification of
HTML should be in terms of a declarative language that is produced
by generators and consumed by browsers, not in terms of how it
impacts the internal memory structure of some browser implementations.
Finally, the parts of the spec that have nothing to do with HTML,
such as SQL storage for web applications, should be kicked out.

The rationale for all of that is because HTML is a declarative
language that has been designed to be portable across a very wide
range of platforms and accessibility constraints, and for the most
part the compliant implementations of HTML are not browsers and
do not behave like browsers.

That is my opinion.  If you think it would be helpful for nearly
2000 other committers on Apache projects to weigh-in on this issue,
then I suggest a poll somewhere that doesn't require W3C accounts.
I don't think that popular opinion should be the basis of rational
protocol design, but I am getting tired of the peanut gallery
being used as an excuse for bad decisions.  Let the real developers
that have to read and implement the standard decide what they
want in a specification.


Received on Thursday, 20 November 2008 01:58:23 UTC