(quotations <q> and <quote>)(and <blockquote> and HTMLQuoteElement) Re: part of my review of 3.12 Phrase elements from Robert Burns on 2007-07-20 (public-html@w3.org from July 2007)

From: Robert Burns <rob@robburns.com>
Date: Thu, 19 Jul 2007 19:16:06 -0500
To: HTML WG <public-html@w3.org>
Message-Id: <E63178A6-CDA6-4115-9A9D-489B7F0C2E5A@robburns.com>
This review contains some merely editorial changes, but I also  
propose some fairly big feature changes. However, I feel I didn't  
provide sufficient use-case motivation for these proposals. So let me  
reintroduce the review with a brief explanation of the problems I'm  
looking to solve.

Practical problems:
1) Use of the <q> element requires UA support  for adding quotation   
marks (through CSS or otherwise) and IE does not support that  
(related to @needsmarks).
2) authors have expressed two different views on quotation marks as:  
A) mere punctuation belonging in the semantic HTML document; and B) a  
presentational idiom for differentiating quoted material from the  
surrounding text (relates to @needsmarks)
3) When treating quotation marks as just a presentational idiom for  
differentiating quoted material, it is often the case that the  
presentation changes when the quoted material has either: A) a block  
content model; or B) the quotation exceeds a certain number of words.  
(This last problem motivates the addition of DOM calls @contentModel,  
@threshold and @words and the new <quote> element whose @needsmarks  
defaults to false).
4) UAs have a need to differentiate elements based on their content  
mode: hence the inclusion of both <q> and <blockquote> in earlier  
versions of HTML. This is important for an authoring tools that wants  
to append or insert nodes into an element and also for providing  
hints of how an element should be presented. However, its not really  
a semantic distinction that authors should need to concern themselves  
with (the content model state is apparent from what the author puts  
inside the <blcokquote> element). Elements such as <div>, <ins>,  
<del>, <object>, <canvas>, <td>, <th>, <li>, <dd> are all elements  
that have a similar distinction, but do not have the <q>/<blockquote>  
element separation. In either case it may be useful for UAs to be  
able to easily determine the current content model state of these  
elements.

Take care,
Rob


ORIGINAL REVIEW
On Jul 19, 2007, at 8:22 AM, Robert Burns wrote:
------------------------------------------

DOM API for content models for HTMLElement:
(this suggestion is related to my review of the <q> quotation element  
subsection)

Considering the complexity of content models in HTML (and perhaps  
even more in HTML5), perhaps we should consider adding a DOM  
attribute for the content model state of an element, and then promote  
the reuse of elements for different purposes. For example, instead of  
needing both <q> and <blockquote> we could have one element <quote>  
that simply had either block elements on the one hand or structural- 
inline or strictly-inline elements on the other hand, but not both.  
The DOM attribute would return the state of the element to allow  
introspection before inserting new child nodes. See more details  
below. The DOM attribute could also (possibly) be used determine  
whether a appendChild or insertChild method should be permitted. In  
any event, it would assist with the increasing differentiation of  
content models and help maintain conforming documents even throughout  
DOM mutations.

DOM API for HTMLQuoteElement:
(this suggestion is related to my review of the <q> quotation element  
subsection)

Consider augmenting the HTMLQuoteElement API

interface HTMLQuoteElement : HTMLElement {
         attribute DOMString cite;
	attribute boolean threshold (unsigned long words);
	attribute unsigned long words;
	attribute boolean needsmarks;
	attribute DOMString contentModel;
};

More details follow.

Quotations <q> and <quote>:

For <q>:

to the paragraph:
"Content inside a q element must be quoted from another source, whose  
IRI, if it has one, should be cited in the cite attribute."

consider adding:

"Authors should include initial and concluding punctuation and other  
characters from the quotation within the q element, only when those  
characters are from the original source. Any other marks necessary to  
punctuate the quotation that are not from the original source should  
be placed outside the element immediately adjacent to the tags. If  
authors wish to present quotations within quotation marks, authors  
should include those quotation marks through a styling mechanism such  
as CSS. Authors may also use the threshold and contentModel DOM  
attributes to alternately display quotations as indented blocks of  
text through alternate styling mechanisms."

Consider adding:

"If the cite attribute is present, it must be an IRI. If the IRI is a  
valid and recognized URL schema, user agents should provide a  
mechanism for users to follow such citation links. For other non-URL  
URIs (or IRI's) , user agents should provide user's access to this  
information through another mechanism (such as a quotation  inspector  
or more broadly an element inspector)"

Consider removing:
The following mechanism is too restrictive for authors. Consider  
eliminating this paragraph and any associated UA conformance criteria  
or processing algorithms.

"if a q element is contained (directly or indirectly) in a paragraph  
that contains a single cite element and has no other q element  
descendants, then, the citation given by that cite element gives the  
source of the quotation contained in the q element"

This is simply too restrictive or even presumptuous of authors (we  
shouldn't be trying to guess what an author is writing in this  
manner). The <cite> and <q> need to be explicitly linked by some  
mechanism (both to avoid inferring author intent and for  
accessibility purposes). There could be situations where an author  
simply wanted to cite one thing and quote another and I don't think  
we should be prohibiting that use in our normative recommendations.

Consider adding:
Consider adding an <editinsert> element and an @editInsert boolean  
(or other data) attribute for use within quotations to differentiate  
contents or markup inserted later (e.g., by the editor) and not from  
the original quotation. The default presentation for such elements  
and attributes would not need to be any different so this would  
degrade gracefully in current UAs. However, through CSS or DOM  
scripts, authors could add square brackets or otherwise style or  
annotate this content.

For example:

<p>What Hamlet said was: <q>To be, or <em editInser='true' >not</em>  
to be?</q> Don't forget the word <em>not</em> there.</p>

<p>What Hamlet said was: <q>To be, or <editInser>as he holds up the  
skull</editInsert> not to be?</q></p>

Consider adding a new <quote> element for quotations. Consider adding  
a boolean @needsmarks  attribute (content and DOM) to this element,  
<q> and <blockquote>. The @needsmarks DOM attribute would be a part  
of the HTMLQuoteElement APII and this API would be included on the  
newly introduced <quote> element. Other DOM attributes would include:

interface HTMLQuoteElement : HTMLElement {
         attribute DOMString cite;
+	attribute boolean threshold (unsigned long words);
+	attribute unsigned long words;
+	attribute boolean needsmarks;
+	attribute DOMString contentModel;
};

The default value for @needsmarks would be true on <q> and false on  
<quote>. In other words when marking up a quotation authors would do  
one of the following three approaches:

<p>To quote Shakespeare: &ldquo;<quote>To be, or not to be? That is  
the question</quote>&rdquo;</p>

<p>To quote Shakespeare: <q>To be, or not to be? That is the  
question</q></p>

<p>To quote Shakespeare: <quote needsmarks='true' >To be, or not to  
be? That is the question</quote></p>

The @needsmarks attribute indicates that the document has been marked- 
up without including quotation styling (whatever that may be) and  
therefore the quotation needs those marks added by a styling  
mechanism. This would provide greater flexibility for authors in that  
they could include quotation marks in the HTML semantic markup  
without fear of foreign stylesheets adding them as well.   
Alternatively, authors could leave the quotation marks off of a  
quotation and CSS would know to do the right thing in that situation  
too.

Note, I used a true/false boolean, but it could be done with  
traditional HTML boolean. It complicates things, and personally, I  
just don't like HTML boolean approach (it might need to be changed to  
a negation of this boolean to work with HTML style booleans and still  
work on the existing quotation elements).

The <quote> element could be used in either situation, without  
breaking compatibility with existing browsers. <q> and <blockquote>,  
could also acquire this capability with the appropriate default  
values (i.e., needsmarks='true' or a traditional HTML boolean  
negation of this attribute with false on <q> and <blockquote>) and  
either HTML5 UA conformance or HTML5 savvy author stylesheets.

The proposed DOM APIs:

	boolean threshold (unsigned long words);
	attribute unsigned long words;
	attribute boolean needsmarks;
	attribute DOMString contentModel;


The DOM threshold would allow DOM agents to pass a certain number to  
the HTMLQuoteElement and the element would simply return true if the  
number was greater or false if not. An alternate method to get at  
this information would be to get the words attribute through the DOM  
which would return the number of words contained in the element  
(i.e., the number of words in what is returned through innerText).

The purpose for these DOM APIs (words, contentModel, and threshold,  
in particular) is to allow the application of different quotation  
styling depending on the number of words (a common convention for  
publishers). Using these DOM attributes, a document could easily  
change the presentation of a quotation to a block indented quotation  
whenever the quotation contained block content, structural inline  
content or when the words exceeded some style threshold of words  
(e.g., 25 words).
Received on Friday, 20 July 2007 00:16:54 UTC