Comments on the draft dated 26 July 2006 from Rob Burns on 2006-10-10 (www-html-editor@w3.org from October to December 2006)

From: Rob Burns <robburns1@mac.com>
Date: Mon, 9 Oct 2006 22:12:52 -0500
To: www-html-editor@w3.org
Message-Id: <7BC50364-32FC-423B-96BE-3EE4D43DAE8F@mac.com>
Dear Editors:

The current draft of XHTML 2 represents a great leap forward for  
semantic authoring. Element after element in the draft improves the  
expressibility of HTML markup. I have a list of comments on the  
draft. Many of these concepts may have already been discussed by the  
draft’s participants. In any event, please take them into  
consideration as this draft progresses towards candidate recommendation.

Sincerely,
Robert Burns

------------------------------------------
Attributes xml:id and id:
	Including both the ‘id’ and ‘xml:id’ attributes introduces  
complexity without any clear benefit. Using  “xml:id’ alone will  
improve the extensibility and interoperability of XHTML with other  
XMLs. It will also avoid needless confusion among authors over these  
largely redundant attributes.

Finer-grained ‘cite’ element (adding a citation ‘type’ attribute):
By adding an ‘type’ attribute to the ‘cite’ element authors could  
more clearly delineate the type of citation. Currently user-agents  
typically apply italics to the ‘cite’ element implying its use is  
meant only for book citations. However, a ‘type’ attribute would  
allow style sheets to select ‘cite’ elements and apply a greater  
variety of presentation to the finer-grained “cite” element. So for  
example ‘type = "(book-title | article-title | webpage-title | author  
| speaker | collection-title |  <QName>)"’.

Subordinate Text Element:
To leverage the work with CSS3, an element for subordinate text like  
‘subtext’ would be useful for authors. This is text that is  
parenthetical to the main text, but may receive various  
presentational idioms. A “rank‘ attribute to differentiate between  
levels of subordination could also be employed. So for example,  
<subtext rank="0"> might be displayed as a parenthetical with  
{before: content("("); after: content(")");}. Whereas <subtext  
rank="1"> would be displayed as an endnote of footnote in printed  
media or as a popup or tooltip note in screen media. The ‘rank’  
attribute could be simply of type number where zero and one would  
cover most needs, but with the flexibility to extend the  
subordination of the text if authors needed that. Subordinate text  
elements might also be used to attach comments to an authored work.

Table Cell Elements:
Rather than continuing separate elements for header (‘th’) and data  
(’td’), XHTML2 should introduce a new ’tc’ element for table cell.  
The distinction between a header cell and a data cell could then be  
expressed through a new attribute. For example, ‘cell=" (header |  
data | both)"’. The ‘tc’ element could be supported instead of the  
’td’ and ‘th’ elements, and those two should probably then be  
deprecated. This would allow authors to explicitly indicate whether a  
cell was both a data cell and a header cell rather than leaving it  
merely implied.

Table ‘col’ and ‘colgroup’ horizontal-alignment:
Somewhere in the transition to stricter content models and XHTMl many  
user-agents dropped support for HTML horizontal alignment within  
tables. In particular, the ability to define horizontal alignment  
with the ‘col’ and ‘colgroup’ elements and to align on a single  
character such as a decimal (“.”). Presumably this was to be picked  
up by CSS as a presentational attribute. However, overtime the CSS  
recommendations also dropped support for both defining presentation  
of tables from the “col” and “colgroup” elements and also in  
presenting various meanings about a table in terms of columns and  
column groupings. I'm not sure if XHTML2 should provide a backup  
mechanism for this, but it should be addressed somewhere.

A Data of Type Element:
A data element could work similar to an XForms output element except  
rather than dynamically generated, it's static value would be its  
content: in other words, the contents of the element. However, a  
‘datatype’ attribute would identify the contents as a particular XSD  
primitive or derived data type. By identifying the data type of the  
contents stylesheets could be used to change the display of the data.  
For example, <data datatype='float'>1000</data> might be styled as  
either "one thousand " or "1,000" or "1000" or "1.0× 10^3" depending  
on the style declaration selecting this data type. Other attributes  
might also be included to indicate ad hoc facets of the data. A  
‘units’ attribute could also be included with QNames drawn from SI,  
US or Imperial units and various calendars. Again the display of the  
units could be determined through style sheets: e.g., “millimeters”  
or “mm” or “m.m.”. Such an element would further extend the  
continuity of documents produced through W3C standards. Validators  
could also be extended to notify authors of invalid content  
(according to the ‘datatype’ attribute). DOM function could ensure  
operations were performed on data of comparable type and even ensure  
units were respected.

A Proper Name Element:
Similar to a data element, XHTML2 should include a  proper name  
element with types: person, place, organization, institution, etc.  
Proper names are an important semantic distinction in authoring  
documents that should not be left to generic elements and ad hoc  
solutions.

Lists:
The current semantics of lists could use some simplification. The  
semantic difference between unordered and ordered lists does not seem  
great enough to warrant separate elements. Rather both seem  
maintained largely legacy and presentational reasons. Perhaps a  
boolean attribute would be better for this distinction in the future.  
When order matters, ‘order="order"’ could be set on a list element.

At the same time, the distinction between definition lists and the  
other lists seems mostly in the use of the definition item. A more  
flexible (and I do not think any more cumbersome) approach would be  
to maintain a single unified list element and one list item element  
and allow the use of an optional definition term element at the  
beginning of any list item. CSS sibling selectors would allow the  
presentation to meet the needs of either legacy presentations while  
enabling further and more flexible presentational idioms as well.

I am not including the new ‘nl’ in this discussion which does seem  
semantically distinct enough to warrant its own element. However, the  
‘ul’, ‘ol’ and ‘dl’ elements could all be merged into one element  
without losing any expressability.

Definition Lists:
	In any event, if the ‘dl’ element is maintained separately, the  
‘value’ attribute should be added to the definition list item to be  
used similarly to the ‘value’ attribute on list item elements.

A Blockparagraph Element:
Rather than simply altering the content model of the ‘p’ element, I  
think it would be better to follow the patterns established in the  
distinction between ‘q’ versus ‘blockquote’’ and ‘code’ versus  
‘blockcode’ by adding a ‘blockparagraph’ element. This way the  
paragraphs would share similar semantic differences with these other  
elements.

Caption Element Content Model:
By distinguishing between ‘p’ and ‘blockparagraph’ elements it would  
also make sense to add the ‘p’ element to the content model of the  
caption element. A caption could then simply handle multiple  
paragraphs of text content. Without that I fear authors may misuse of  
‘l’ or ‘separator’ elements or feel the need to reintroduce legacy  
elements to handle captions requiring multiple paragraphs. While at  
the same time, excluding blockparagraphs from the caption element  
will keep captions relatively simple as their semantics require.

Paraphrase Elements:
To associate newly authored content with one or a few sources, a  
paraphrase element would be useful: perhaps in both block and non- 
block forms. Like ‘q’ and ‘blockquote’ these elements could allow  
authors to associate the paraphrasing with specific sources through  
the ‘cite’ attribute.

A Marker Element:
Add a marker element (e.g., <marker id='someNumber'/> as a way to  
insert an empty element marker into a document where, unlike anchor  
and span, one wants to refer to a single point in the document rather  
than a range. This might be used presentationally, like the  
separator, for a page-break or column-break. Or it may have no  
presentation at all but serve as a “bookmark” within a document. I  
think there is a need for such a generic empty element: one whose  
default presentation has no display, but instead serves as a marker  
within the document.

PCData and Mixed-Content:
The current draft is not entirely clear regarding content models:  
particularly for structural elements and the “Flow” content model.  
The prose indicates PCData within the content models of several  
elements that previously contained only child elements. It is not  
clear whether this is merely due to changes in the XML definitions of  
PCData; whether it relates only to whitespace; or whether this is a  
change in the content models of these elements.

If this PCData does not mean only white-space characters then its  
introduction into several content models is unwarranted.  For example  
the ‘section’ element shows a content model of (PCDATA | Flow)* while  
the prose say “This element defines content to be block-level…”. I  
think the ‘section’ element’s content model would be better as  
(Heading | Structural)* which is closer to what the prose suggests.

In addition, the ‘blockquote’ and ‘blockcode’ elements list content  
models of (PCDATA | Text | Heading | Structural | List)* which  
appears to be equivalent to (PCDATA | Flow). In the case of  
‘blockcode’, this tends to blur the distinction between ‘blockquote’  
and ‘q’ elements that was stricter in prior recommendations.

The ‘img’ Element:
	While retaining the ‘img’ element for legacy reasons it may be  
better to include in within the embedding collection as a sort of  
subclass of the object element. In this way the '‘img’ element could  
include fall-back content, a standby element, and particularly a  
caption element. However it could also add the ‘alt’ attribute as an  
alternative fallback mechanism: a mechanism only used if the element  
contained no content. In this way authors who have been reluctant to  
switch to the ‘object’ element could still use the familiarly-named   
‘img’ element, but could eventually come to use the contents of the  
element for fall-back instead of the ‘alt’ attribute.
Received on Wednesday, 11 October 2006 11:28:23 UTC