Re: Bug 7034 from Sam Ruby on 2010-03-20 (public-html@w3.org from March 2010)

From: Sam Ruby <rubys@intertwingly.net>
Date: Sat, 20 Mar 2010 08:14:58 -0400
To: Maciej Stachowiak <mjs@apple.com>
CC: Shelley Powers <shelley.just@gmail.com>, "Tab Atkins Jr." <jackalmage@gmail.com>, "Ennals, Robert" <robert.ennals@intel.com>, HTMLwg <public-html@w3.org>
Message-ID: <4BA4BC42.2010409@intertwingly.net>
On 03/19/2010 05:58 PM, Maciej Stachowiak wrote:
>
> On Mar 19, 2010, at 2:33 PM, Sam Ruby wrote:
>
>>
>> [co-chair hat off]
>>
>> My request is for rationale. I assume there is a coherent strategy
>> behind this, but I don't see it. Each time I take a closer look, I
>> find what appears to me to be glaring inconsistencies.
>>
>>> Incidentally, I think I would personally agree with both of the two
>>> specific points above.
>>
>> My request is for rationale. If there is a good rationale for these
>> points that fits with a larger strategy, then I would disagree with
>> both of those specific points above.
>>
>> What you are asking me to do to take guesses as to what the intent is
>> for the authoring requirements, take pot shots at the spec without
>> this necessary understanding, see what falls over when I do, and then
>> repeat the process until either nothing is left standing or what is
>> left standing does have consensus.
>>
>> That does not seem like a sane alternative to me.
>
> What I'm asking is that you follow the Decision Policy guidelines for
> what should go in a bug:
> <http://dev.w3.org/html5/decision-policy/decision-policy.html#bugzilla-bug>.
> I don't think bug 7034 satisfies any of those four bullet points in its
> current state.
>
> And I'm letting you know that if the bug report doesn't meet those
> guidelines, the likely result is NEEDSINFO, and that I at least would
> agree with that resolution. If you're not interested in doing anything
> further to avoid that outcome, then I am satisfied to leave the bug alone.

You have now seen what a mere evening's worth of work can produce:

http://intertwingly.net/blog/2010/03/20/Authoring-Conformance-Requirements

I don't even know how to begin to reasonably categorize all this data. 
And given that that evening's worth of work included the production of a 
script to help analyze the data, imagine what I could do in one more 
evening, or a week.

One simple example to show how this relates to issue-41.  Suppose a 
person authors a page for iPhone users.  This page to be served in PHP. 
  This person uses Emacs.  During the course of development, at one 
point some portion of the page is commented out.  That portion happens 
to contain to contain consecutive dashes.  Per the current draft, this 
is a conformance error.  Per Validator.nu, the reason given is this data 
can't be serialized as XML 1.0.  I don't know if that matches the 
editor's reasoning, I can't read his mind.  But it is all I have to go 
on at the moment.

As a user, my reaction would be along the lines of "thanks for sharing". 
  At no point in any scenario that this user cares about is an XML 1.0 
serializer involved.  At best, this requirement is a SHOULD, but given 
the number of pages that exist on the Internet and the relative 
frequency that any of them are ever serialized as XML 1.0, I think that 
this that a SHOULD requirement is a bit much.

Now consider www.sina.com.cn, a site that I wouldn't tend to frequent 
for obvious reasons, but one that Alexa reports as #12 on the whole 
Internet.  Given the size of the Internet, I'm sure you would agree that 
being #12 is no small feat.  On that page there are a number of 
conformance errors, two of which involve consecutive dashes inside a 
comment.  I personally doubt that page was produced using Emacs, but the 
principle involved is the same.  However it is produced, the page is 
served as text/html, and I would assert that that is evidence enough 
that the expectation is that this page is to be processed as HTML, and 
that furthermore the expectation is rather low that the content will 
ever be serialized as XML 1.0.

Now consider site #5 on the internet: live.com.  I'm also pretty sure 
that this site was not authored using Emacs.  It, too, is served as 
text/html.  It contains an attribute that validator.nu asserts can't be 
serializable as XML 1.0.  The statement that validator.nu makes is 
somewhat incomplete and arguably misleading.  The page is well-formed 
XML, to the point of containing XML style <![CDATA[]> blocks inside 
JavaScript comments and being parseable using expat.  What is true is 
that if that the DOM that is produced if that page is parsed as html 
could not be produced by parsing an XML document -- a scenario that 
understandably might not be all that important to the authors of live.com.

I'll also note that the xml:lang attribute that is also present in this 
same page does not meet the criteria of producing a DOM when parsed 
using an HTML parser that can also be produced using an XML parser.

Given all the evidence I have available to me, I would say that 
producing a page that is well-formed XML is something that is important 
to the authors of live.com, for reasons that I can only guess.  This 
site is not alone in that regard, but among the sites for which this is 
true, it currently is the one that gets the most traffic according to Alexa.

Maciej, you've personally argued against this exact syntax for reasons 
involving fallback stories.  In this specific case involving live.com, I 
would assert that argument again fails into the category of "thanks for 
sharing", i.e., involving a scenario which is entirely irrelevant to the 
expected use of this specific page.

Given this, I believe a case could be made that the live.com page, along 
with google.com, as is should both be considered conforming html5.  It 
is true that both pages contain elements that some people may recommend 
not be emulated in other contexts, but given the substantial and 
expected use of these two pages, there is nothing categorically /wrong/ 
with these two pages.

I don't know if this line of reasoning is something that you would 
consider compelling or something the group would find consensus on. 
That's not the point of this (now somewhat lengthy) email.  If you look 
at the line of reasoning, it actually is a house of cards.  It takes an 
extended form of "if this then if that then if this other thing then: 
conclusion".  And why does it take such a form?  The answer is simple: 
the reasoning is necessarily built on guesses, and the reason why those 
guesses are necessary is because absolutely zilch has been provide in 
terms of rationale for why these restrictions are in the document in the 
first place.

This is but one simple example.  In the evenings worth of work I 
produces a several dozen such examples, each of which could reasonably 
be opened as a bug.  This could be done, but doing so would be entirely 
unproductive and unnecessary.  What is in order here is to ask for a bit 
of rationale for the current set of conformance criteria.  I'll note 
that this is not like a parsing rule for which the answer could be 
"three out of the four browsers agree"; this is a topic which is a 
clearly a matter of judgment, and so asking those that formulated this 
set of opinions to explain their rationale is in order.

Failing compelling rationale, the alternative is to start over.  We 
should rip out all of the conformance requirements and put back in new 
ones that have a solid rationale.  As an example, even is your page is 
100% "well-formed", if your page triggers the adoption agency algorithm 
with any content that actually turns out to be visible, then I believe I 
personally could be persuaded to agree that such a page should be marked 
as non-conforming.

I'll go further: rip out does not necessarily mean throw away.  Escaping 
one's ampersands can be argued to be a best practice.  Some could make a 
similar case for explicitly closing all open non-void elements as a best 
practice.  And even avoiding double dashes in comments.  People can, and 
will, disagree on what is or is not a best practice.  That's OK too.  I 
don't object to such being captured and collected into a document, one 
perhaps to be published as a Note or even a Rec.  I just don't believe 
that the case has been made that such opinion has any place in the one 
document entitled "HTML5. A vocabulary and associated APIs for HTML and 
XHTML".

> Regards,
> Maciej

- Sam Ruby
Received on Saturday, 20 March 2010 12:15:30 UTC