- From: Sam Ruby <rubys@intertwingly.net>
- Date: Sat, 20 Mar 2010 08:14:58 -0400
- To: Maciej Stachowiak <mjs@apple.com>
- CC: Shelley Powers <shelley.just@gmail.com>, "Tab Atkins Jr." <jackalmage@gmail.com>, "Ennals, Robert" <robert.ennals@intel.com>, HTMLwg <public-html@w3.org>
On 03/19/2010 05:58 PM, Maciej Stachowiak wrote: > > On Mar 19, 2010, at 2:33 PM, Sam Ruby wrote: > >> >> [co-chair hat off] >> >> My request is for rationale. I assume there is a coherent strategy >> behind this, but I don't see it. Each time I take a closer look, I >> find what appears to me to be glaring inconsistencies. >> >>> Incidentally, I think I would personally agree with both of the two >>> specific points above. >> >> My request is for rationale. If there is a good rationale for these >> points that fits with a larger strategy, then I would disagree with >> both of those specific points above. >> >> What you are asking me to do to take guesses as to what the intent is >> for the authoring requirements, take pot shots at the spec without >> this necessary understanding, see what falls over when I do, and then >> repeat the process until either nothing is left standing or what is >> left standing does have consensus. >> >> That does not seem like a sane alternative to me. > > What I'm asking is that you follow the Decision Policy guidelines for > what should go in a bug: > <http://dev.w3.org/html5/decision-policy/decision-policy.html#bugzilla-bug>. > I don't think bug 7034 satisfies any of those four bullet points in its > current state. > > And I'm letting you know that if the bug report doesn't meet those > guidelines, the likely result is NEEDSINFO, and that I at least would > agree with that resolution. If you're not interested in doing anything > further to avoid that outcome, then I am satisfied to leave the bug alone. You have now seen what a mere evening's worth of work can produce: http://intertwingly.net/blog/2010/03/20/Authoring-Conformance-Requirements I don't even know how to begin to reasonably categorize all this data. And given that that evening's worth of work included the production of a script to help analyze the data, imagine what I could do in one more evening, or a week. One simple example to show how this relates to issue-41. Suppose a person authors a page for iPhone users. This page to be served in PHP. This person uses Emacs. During the course of development, at one point some portion of the page is commented out. That portion happens to contain to contain consecutive dashes. Per the current draft, this is a conformance error. Per Validator.nu, the reason given is this data can't be serialized as XML 1.0. I don't know if that matches the editor's reasoning, I can't read his mind. But it is all I have to go on at the moment. As a user, my reaction would be along the lines of "thanks for sharing". At no point in any scenario that this user cares about is an XML 1.0 serializer involved. At best, this requirement is a SHOULD, but given the number of pages that exist on the Internet and the relative frequency that any of them are ever serialized as XML 1.0, I think that this that a SHOULD requirement is a bit much. Now consider www.sina.com.cn, a site that I wouldn't tend to frequent for obvious reasons, but one that Alexa reports as #12 on the whole Internet. Given the size of the Internet, I'm sure you would agree that being #12 is no small feat. On that page there are a number of conformance errors, two of which involve consecutive dashes inside a comment. I personally doubt that page was produced using Emacs, but the principle involved is the same. However it is produced, the page is served as text/html, and I would assert that that is evidence enough that the expectation is that this page is to be processed as HTML, and that furthermore the expectation is rather low that the content will ever be serialized as XML 1.0. Now consider site #5 on the internet: live.com. I'm also pretty sure that this site was not authored using Emacs. It, too, is served as text/html. It contains an attribute that validator.nu asserts can't be serializable as XML 1.0. The statement that validator.nu makes is somewhat incomplete and arguably misleading. The page is well-formed XML, to the point of containing XML style <![CDATA[]> blocks inside JavaScript comments and being parseable using expat. What is true is that if that the DOM that is produced if that page is parsed as html could not be produced by parsing an XML document -- a scenario that understandably might not be all that important to the authors of live.com. I'll also note that the xml:lang attribute that is also present in this same page does not meet the criteria of producing a DOM when parsed using an HTML parser that can also be produced using an XML parser. Given all the evidence I have available to me, I would say that producing a page that is well-formed XML is something that is important to the authors of live.com, for reasons that I can only guess. This site is not alone in that regard, but among the sites for which this is true, it currently is the one that gets the most traffic according to Alexa. Maciej, you've personally argued against this exact syntax for reasons involving fallback stories. In this specific case involving live.com, I would assert that argument again fails into the category of "thanks for sharing", i.e., involving a scenario which is entirely irrelevant to the expected use of this specific page. Given this, I believe a case could be made that the live.com page, along with google.com, as is should both be considered conforming html5. It is true that both pages contain elements that some people may recommend not be emulated in other contexts, but given the substantial and expected use of these two pages, there is nothing categorically /wrong/ with these two pages. I don't know if this line of reasoning is something that you would consider compelling or something the group would find consensus on. That's not the point of this (now somewhat lengthy) email. If you look at the line of reasoning, it actually is a house of cards. It takes an extended form of "if this then if that then if this other thing then: conclusion". And why does it take such a form? The answer is simple: the reasoning is necessarily built on guesses, and the reason why those guesses are necessary is because absolutely zilch has been provide in terms of rationale for why these restrictions are in the document in the first place. This is but one simple example. In the evenings worth of work I produces a several dozen such examples, each of which could reasonably be opened as a bug. This could be done, but doing so would be entirely unproductive and unnecessary. What is in order here is to ask for a bit of rationale for the current set of conformance criteria. I'll note that this is not like a parsing rule for which the answer could be "three out of the four browsers agree"; this is a topic which is a clearly a matter of judgment, and so asking those that formulated this set of opinions to explain their rationale is in order. Failing compelling rationale, the alternative is to start over. We should rip out all of the conformance requirements and put back in new ones that have a solid rationale. As an example, even is your page is 100% "well-formed", if your page triggers the adoption agency algorithm with any content that actually turns out to be visible, then I believe I personally could be persuaded to agree that such a page should be marked as non-conforming. I'll go further: rip out does not necessarily mean throw away. Escaping one's ampersands can be argued to be a best practice. Some could make a similar case for explicitly closing all open non-void elements as a best practice. And even avoiding double dashes in comments. People can, and will, disagree on what is or is not a best practice. That's OK too. I don't object to such being captured and collected into a document, one perhaps to be published as a Note or even a Rec. I just don't believe that the case has been made that such opinion has any place in the one document entitled "HTML5. A vocabulary and associated APIs for HTML and XHTML". > Regards, > Maciej - Sam Ruby
Received on Saturday, 20 March 2010 12:15:30 UTC