<q> and transitioning forward from Robert J Burns on 2008-10-30 (public-html@w3.org from October 2008)

From: Robert J Burns <rob@robburns.com>
Date: Thu, 30 Oct 2008 09:34:54 -0500
To: "Sam Kuper" <sam.kuper@uclmail.net>
Cc: "Philip TAYLOR (Ret'd)" <P.Taylor@rhul.ac.uk>, "HTML WG" <public-html@w3.org>
Message-Id: <59BD20EE-EBAB-4900-A549-5D19D2ACC139@robburns.com>
Hi Sam,

On Oct 30, 2008, at 8:01 AM, Sam Kuper wrote:

> 2008/10/30 Philip TAYLOR (Ret'd) <P.Taylor@rhul.ac.uk>
> Sam Kuper wrote:
> Actually, the HTML 4.01 spec is slightly mealy-mouthed on this  
> point. See s.19.1 [1]:
> [...]
> My reading of this, especially the last sentence I've quoted above,  
> is that while automated "validators" detect "a large set of errors  
> that make documents invalid", they cannot catch all such errors.  
> Since avoiding all such errors seems to be synonymous with  
> conforming to the HTML 4 specification, this appears to imply that  
> the sample document you presented is, indeed, invalid.
>
> OK, here I respectfully disagree.  It clashes with a "should not",
> not with a "must not", and therefore if that is the only deviation
> from the specification the document remains valid.
>
> Valid but poorly-conforming, right? Well, I certainly think the HTML  
> 4 spec is vague enough that that's a fair reading. I hope the HTML 5  
> spec in its final form avoids this kind of vagueness altogether, and  
> defines "validity" and "conformance" explicitly enough that (within  
> the scope of HTML 5 at least) the matter will no longer be up for  
> discussion.
>
> My original point was that implementing heuristic suppression of  
> quotation marks generated from <q> isn't justified from a backwards- 
> compatibility standpoint in cases where quotation marks are written  
> immediately within <q> because, as I put it, such mark-up is  
> "invalid". Even if (and you may be right about this - I'm not sure)  
> such mark-up is not "invalid" but merely not in conformance with the  
> recommendations of the spec, I think my point still stands: for HTML  
> 5 to *support* HTML 4.x document authors' contraventions of the HTML  
> 4.x specifications' recommendations, would *not* represent backwards  
> compatibility.

I think such a stand overlooks the practical issues authors face  
regarding the q element. For the life of the www, the Q element has  
been insufficiently supported by leading browsers. To follow the  
SHOULD NOT recommendation in HTML4.01 leads to an inadequately  
presented Q element in the leading web browser. Given the following  
HTML source which is machine valid and conforming, though not  
following a recommendation of HTML4.01 (a recommendation that renders  
the Q element incompatible with conventional quotation presentation in  
leading browsers):


	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
	<Html lang='en' >
	<Head><Title></Title></Head>
	<Body><q>"(A > quote.)"</q></Body>
	</Html>

Now imagine that IE adds HTML4.01 conformance regarding the Q element.  
Suddenly an author targeting only IE and not testing in other browsers  
will see the rending of this document change from:

"(A > quote.)"

to

“"(A > quote.)"”

Now previously, the document would only render correctly in IE —  
rendering with duplicate quotations in nearly all other browsers.  
However, for an author testing only on IE (and many exist worldwide),  
this problem would not be caught. Now if IE wants to add HTML4.01 Q  
support, it will probably need to deal with this problem by either:

1) breaking web sites with the release of IE8 (though a minor break)  
and letting authors correct their sites to work with IE8 and conform  
to HTML4.01. The problem is that — for pages that need to conform to  
IE<8 — this forces authors to add a stylesheet and not fix the HTML.  
Such a change then forces us further away from a Q element that  
adheres to the separation of concerns.

2) add the heuristics which would make the sites continue working with  
IE8 (though still remain broken in the other browsers). Granted this  
algorithm would provide little incentive to fix the HTML, but authors  
could eventually fix their HTML and author new conforming HTML once  
the older browsers were no longer targeted by the author.

Especially if IE8 decides to go with option (2) then I'd rather see  
HTML5 specify the algorithm for those heuristics and get all browsers  
to share the same algorithm (even if that algorithm had some errors  
since authors can still correct their pages and CSS to bring them into  
full conformance with HTML4.01).

Since the algorithm is focussed on removing duplicate quotation marks,  
the CSS :before and :after properties point the UA to the precise spot  
to look for duplicate quotations (potentially separated by  
whitespace). The UA then need only scan the string within the element  
to ensure the duplicate quotation marks to be removed constitute a  
matched pair. For nested quotations, this algorithm would begin from  
the deepest nesting level and work its way out.

However, I think the best approach would be to 1) break pages in IE8,  
2) let authors fix their HTML and then 3) create workarounds for the  
lack of CSS content generated quotations marks in IE<8 (though I don't  
know what that workaround might be but perhaps something with DOM  
manipulation that could actually make use of the stylesheets :before  
and :after properties).

Take care,
Rob
Received on Thursday, 30 October 2008 14:35:55 UTC