- From: Robert J Burns <rob@robburns.com>
- Date: Thu, 30 Oct 2008 09:34:54 -0500
- To: "Sam Kuper" <sam.kuper@uclmail.net>
- Cc: "Philip TAYLOR (Ret'd)" <P.Taylor@rhul.ac.uk>, "HTML WG" <public-html@w3.org>
- Message-Id: <59BD20EE-EBAB-4900-A549-5D19D2ACC139@robburns.com>
Hi Sam, On Oct 30, 2008, at 8:01 AM, Sam Kuper wrote: > 2008/10/30 Philip TAYLOR (Ret'd) <P.Taylor@rhul.ac.uk> > Sam Kuper wrote: > Actually, the HTML 4.01 spec is slightly mealy-mouthed on this > point. See s.19.1 [1]: > [...] > My reading of this, especially the last sentence I've quoted above, > is that while automated "validators" detect "a large set of errors > that make documents invalid", they cannot catch all such errors. > Since avoiding all such errors seems to be synonymous with > conforming to the HTML 4 specification, this appears to imply that > the sample document you presented is, indeed, invalid. > > OK, here I respectfully disagree. It clashes with a "should not", > not with a "must not", and therefore if that is the only deviation > from the specification the document remains valid. > > Valid but poorly-conforming, right? Well, I certainly think the HTML > 4 spec is vague enough that that's a fair reading. I hope the HTML 5 > spec in its final form avoids this kind of vagueness altogether, and > defines "validity" and "conformance" explicitly enough that (within > the scope of HTML 5 at least) the matter will no longer be up for > discussion. > > My original point was that implementing heuristic suppression of > quotation marks generated from <q> isn't justified from a backwards- > compatibility standpoint in cases where quotation marks are written > immediately within <q> because, as I put it, such mark-up is > "invalid". Even if (and you may be right about this - I'm not sure) > such mark-up is not "invalid" but merely not in conformance with the > recommendations of the spec, I think my point still stands: for HTML > 5 to *support* HTML 4.x document authors' contraventions of the HTML > 4.x specifications' recommendations, would *not* represent backwards > compatibility. I think such a stand overlooks the practical issues authors face regarding the q element. For the life of the www, the Q element has been insufficiently supported by leading browsers. To follow the SHOULD NOT recommendation in HTML4.01 leads to an inadequately presented Q element in the leading web browser. Given the following HTML source which is machine valid and conforming, though not following a recommendation of HTML4.01 (a recommendation that renders the Q element incompatible with conventional quotation presentation in leading browsers): <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <Html lang='en' > <Head><Title></Title></Head> <Body><q>"(A > quote.)"</q></Body> </Html> Now imagine that IE adds HTML4.01 conformance regarding the Q element. Suddenly an author targeting only IE and not testing in other browsers will see the rending of this document change from: "(A > quote.)" to “"(A > quote.)"” Now previously, the document would only render correctly in IE — rendering with duplicate quotations in nearly all other browsers. However, for an author testing only on IE (and many exist worldwide), this problem would not be caught. Now if IE wants to add HTML4.01 Q support, it will probably need to deal with this problem by either: 1) breaking web sites with the release of IE8 (though a minor break) and letting authors correct their sites to work with IE8 and conform to HTML4.01. The problem is that — for pages that need to conform to IE<8 — this forces authors to add a stylesheet and not fix the HTML. Such a change then forces us further away from a Q element that adheres to the separation of concerns. 2) add the heuristics which would make the sites continue working with IE8 (though still remain broken in the other browsers). Granted this algorithm would provide little incentive to fix the HTML, but authors could eventually fix their HTML and author new conforming HTML once the older browsers were no longer targeted by the author. Especially if IE8 decides to go with option (2) then I'd rather see HTML5 specify the algorithm for those heuristics and get all browsers to share the same algorithm (even if that algorithm had some errors since authors can still correct their pages and CSS to bring them into full conformance with HTML4.01). Since the algorithm is focussed on removing duplicate quotation marks, the CSS :before and :after properties point the UA to the precise spot to look for duplicate quotations (potentially separated by whitespace). The UA then need only scan the string within the element to ensure the duplicate quotation marks to be removed constitute a matched pair. For nested quotations, this algorithm would begin from the deepest nesting level and work its way out. However, I think the best approach would be to 1) break pages in IE8, 2) let authors fix their HTML and then 3) create workarounds for the lack of CSS content generated quotations marks in IE<8 (though I don't know what that workaround might be but perhaps something with DOM manipulation that could actually make use of the stylesheets :before and :after properties). Take care, Rob
Received on Thursday, 30 October 2008 14:35:55 UTC