Re: Bug 7034 from Maciej Stachowiak on 2010-03-21 (public-html@w3.org from March 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sat, 20 Mar 2010 18:04:36 -0700
To: Sam Ruby <rubys@intertwingly.net>
Cc: Shelley Powers <shelley.just@gmail.com>, "Tab Atkins Jr." <jackalmage@gmail.com>, "Ennals, Robert" <robert.ennals@intel.com>, HTMLwg <public-html@w3.org>
Message-id: <5621D049-8D7E-4992-876C-F664EC30C07F@apple.com>
On Mar 20, 2010, at 5:14 AM, Sam Ruby wrote:

> On 03/19/2010 05:58 PM, Maciej Stachowiak wrote:
>>
>>
>> What I'm asking is that you follow the Decision Policy guidelines for
>> what should go in a bug:
>> <http://dev.w3.org/html5/decision-policy/decision-policy.html#bugzilla-bug 
>> >.
>> I don't think bug 7034 satisfies any of those four bullet points in  
>> its
>> current state.
>>
>> And I'm letting you know that if the bug report doesn't meet those
>> guidelines, the likely result is NEEDSINFO, and that I at least would
>> agree with that resolution. If you're not interested in doing  
>> anything
>> further to avoid that outcome, then I am satisfied to leave the bug  
>> alone.
>
> You have now seen what a mere evening's worth of work can produce:
>
> http://intertwingly.net/blog/2010/03/20/Authoring-Conformance-Requirements
>
> I don't even know how to begin to reasonably categorize all this  
> data. And given that that evening's worth of work included the  
> production of a script to help analyze the data, imagine what I  
> could do in one more evening, or a week.

That data looks like a really good staring point for filing some  
focused bugs. I'm willing to help classify the errors and file bugs,  
though my judgment on which errors are unhelpful may not agree with  
yours. It seems like your script has already done a rough first pass  
on classifing errors.

The bottom line is that if we (the Working Group) aren't willing to  
invest the time to analyze this data, how can we expect the editor to  
do so? Conversely, if we identify every error or class of errors that  
seems wrong or unhelpful, then I do think we can rightfully expect a  
fix or specific rationale.

> One simple example to show how this relates to issue-41.  Suppose a  
> person authors a page for iPhone users.  This page to be served in  
> PHP.  This person uses Emacs.  During the course of development, at  
> one point some portion of the page is commented out.  That portion  
> happens to contain to contain consecutive dashes.  Per the current  
> draft, this is a conformance error.  Per Validator.nu, the reason  
> given is this data can't be serialized as XML 1.0.  I don't know if  
> that matches the editor's reasoning, I can't read his mind.  But it  
> is all I have to go on at the moment.
>
> As a user, my reaction would be along the lines of "thanks for  
> sharing".  At no point in any scenario that this user cares about is  
> an XML 1.0 serializer involved.  At best, this requirement is a  
> SHOULD, but given the number of pages that exist on the Internet and  
> the relative frequency that any of them are ever serialized as XML  
> 1.0, I think that this that a SHOULD requirement is a bit much.

I think filing a bug on this specific conformance requirement (and  
thereby asking the editor to either remove or justify it) would be a  
positive step.

>
> Now consider www.sina.com.cn, a site that I wouldn't tend to  
> frequent for obvious reasons, but one that Alexa reports as #12 on  
> the whole Internet.  Given the size of the Internet, I'm sure you  
> would agree that being #12 is no small feat.  On that page there are  
> a number of conformance errors, two of which involve consecutive  
> dashes inside a comment.  I personally doubt that page was produced  
> using Emacs, but the principle involved is the same.  However it is  
> produced, the page is served as text/html, and I would assert that  
> that is evidence enough that the expectation is that this page is to  
> be processed as HTML, and that furthermore the expectation is rather  
> low that the content will ever be serialized as XML 1.0.
>
> Now consider site #5 on the internet: live.com.  I'm also pretty  
> sure that this site was not authored using Emacs.  It, too, is  
> served as text/html.  It contains an attribute that validator.nu  
> asserts can't be serializable as XML 1.0.  The statement that  
> validator.nu makes is somewhat incomplete and arguably misleading.   
> The page is well-formed XML, to the point of containing XML style <! 
> [CDATA[]> blocks inside JavaScript comments and being parseable  
> using expat.  What is true is that if that the DOM that is produced  
> if that page is parsed as html could not be produced by parsing an  
> XML document -- a scenario that understandably might not be all that  
> important to the authors of live.com.

For everyone's reference, the one error validating live.com (other  
than the doctype) is this:

Error: Attribute xmlns:web not allowed here.
 From line 1, column 122; to line 1, column 229
onal.dtd"><html lang="fr" xml:lang="fr" xmlns="http://www.w3.org/1999/xhtml 
" xmlns:Web="http://schemas.live.com/Web/"><head>

Note that this Web namespace prefix declared here is not actually used  
anywhere on the page, so I suspect the declaration is not there  
intentionally.

>
> I'll also note that the xml:lang attribute that is also present in  
> this same page does not meet the criteria of producing a DOM when  
> parsed using an HTML parser that can also be produced using an XML  
> parser.
>
> Given all the evidence I have available to me, I would say that  
> producing a page that is well-formed XML is something that is  
> important to the authors of live.com, for reasons that I can only  
> guess.  This site is not alone in that regard, but among the sites  
> for which this is true, it currently is the one that gets the most  
> traffic according to Alexa.
>
> Maciej, you've personally argued against this exact syntax for  
> reasons involving fallback stories.  In this specific case involving live.com 
> , I would assert that argument again fails into the category of  
> "thanks for sharing", i.e., involving a scenario which is entirely  
> irrelevant to the expected use of this specific page.

I'm not sure what you mean by "this exact syntax" - attributes with a  
colon in the name in text/html? In this case, the inclusion of the  
xmlns:Web attribute seems pretty clearly an oversight, so I don't  
think this is a very compelling use case.  A significant reason that  
an xmlns prefix declaration doesn't cause any trouble here is that  
there is no attempt to use it.

>
> Given this, I believe a case could be made that the live.com page,  
> along with google.com, as is should both be considered conforming  
> html5.  It is true that both pages contain elements that some people  
> may recommend not be emulated in other contexts, but given the  
> substantial and expected use of these two pages, there is nothing  
> categorically /wrong/ with these two pages.
>
> I don't know if this line of reasoning is something that you would  
> consider compelling or something the group would find consensus on.  
> That's not the point of this (now somewhat lengthy) email.  If you  
> look at the line of reasoning, it actually is a house of cards.  It  
> takes an extended form of "if this then if that then if this other  
> thing then: conclusion".  And why does it take such a form?  The  
> answer is simple: the reasoning is necessarily built on guesses, and  
> the reason why those guesses are necessary is because absolutely  
> zilch has been provide in terms of rationale for why these  
> restrictions are in the document in the first place.

It looks to me like all your guesswork above was about the intent of  
the page author, not the intent of the spec. It seems like you didn't  
need to inquire into the intent of the spec to conclude that certain  
conformance errors are unhelpful. I would say filing bugs on those  
specific cases is more constructive than a single bug that says  
"change the conformance requirements in some way", and more likely to  
get you the rationale you want. That's so even if a very good  
rationale could change your mind when your initial judgment is that a  
particular conformance error is bogus.

>
> This is but one simple example.  In the evenings worth of work I  
> produces a several dozen such examples, each of which could  
> reasonably be opened as a bug.  This could be done, but doing so  
> would be entirely unproductive and unnecessary.  What is in order  
> here is to ask for a bit of rationale for the current set of  
> conformance criteria.  I'll note that this is not like a parsing  
> rule for which the answer could be "three out of the four browsers  
> agree"; this is a topic which is a clearly a matter of judgment, and  
> so asking those that formulated this set of opinions to explain  
> their rationale is in order.
>
> Failing compelling rationale, the alternative is to start over.  We  
> should rip out all of the conformance requirements and put back in  
> new ones that have a solid rationale.  As an example, even is your  
> page is 100% "well-formed", if your page triggers the adoption  
> agency algorithm with any content that actually turns out to be  
> visible, then I believe I personally could be persuaded to agree  
> that such a page should be marked as non-conforming.

I think starting over is not a strategy that gets us to Last Call this  
year. And let's be clear, you're suggesting starting over on text that  
comprises around half the spec. The single-page HTML5 Working Draft is  
696 printed pages, the author-only view is 355 printed pages.

> I'll go further: rip out does not necessarily mean throw away.   
> Escaping one's ampersands can be argued to be a best practice.  Some  
> could make a similar case for explicitly closing all open non-void  
> elements as a best practice.  And even avoiding double dashes in  
> comments.  People can, and will, disagree on what is or is not a  
> best practice.  That's OK too.  I don't object to such being  
> captured and collected into a document, one perhaps to be published  
> as a Note or even a Rec.  I just don't believe that the case has  
> been made that such opinion has any place in the one document  
> entitled "HTML5. A vocabulary and associated APIs for HTML and XHTML".

I would say you should object to the specific things that you think  
are not right as conformance requirements.

For example, if you don't object to "well-formed" type errors being  
conformance failures, or triggering the adoption agency algorithm  
being a conformance failure, then it seems like a waste of everyone's  
time to rip those out and put them back in. It seems to me like that  
would just be disruption to make a point. If you cannot be bothered to  
enumerate the conformance requirements (or categories thereof) that  
you actually object to (even if only tentatively barring compelling  
rationale, and even if the list is incomplete), then I don't see why  
anyone else is obliged to do the research for you.

Or to put it another way, I don't see a compelling case here for not  
following the Decision Policy guidelines on what should go in a bug. I  
still think unambiguous bug reports are the best way to move forward

Regards,
Maciej
Received on Sunday, 21 March 2010 01:05:14 UTC