(Time, Meter, Progress, DATA <time>, <meter>, <progress>, <data>, etc.) part of my review of 3.12 Phrase elements from Robert Burns on 2007-07-22 (public-html@w3.org from July 2007)

From: Robert Burns <rob@robburns.com>
Date: Sun, 22 Jul 2007 04:11:02 -0500
To: public-html WG <public-html@w3.org>
Message-Id: <9AB1E870-EE04-4CA7-8542-E89F5C5202B2@robburns.com>
Summary:

• Editorial suggestions
• Consider eliminating the quasi-heuristic extraction of ratios from  
the contents of the PROGRESS and METER elements.
• Propose minor changes to ratio parsing algorithm and METER and  
PROGRESS contents algorithms if not eliminating textContent parsing  
ratio parsing
• Proposed a new DATA element with @ofType attribute
• Proposed new elements: <maxvalue>, <minvalue>, <highvalue>,  
<lowvalue>, <actualvalue>, <optimum>
• Proposed new content attributes
 • TIME: @calendar, @clock
 • METER and PROGRESS: @units, @labels, @ticks
 •  METER: @scale (with proposed QNames: "linear", "log10", "logn",  
"expn", etc.; default: "linear")
• Propose new DOM attributes (probably better names could be  
iinvented)
 • TIME: dOMDateTime
 • METER and PROGRESS (if we continue to allow value specification  
in the element's contents):
  • actualValue,
  •  dMaxValue
• Recommend altering number related algorithms consider:
 •  including all Unicode general category Nd (not just ASCII 0-9)
 • adding Arabic decimal separator (U+066B)
 • removing Unicode compatibility characters:
  •  Small Percent Sign (U+FE6A)
  •  Fullwidth Percent Sign (U+FF05)
• Consider adding snapshot of XSD namespace to HTML for data type  
QNames.
• Consider adding new QNames to HTML for scale ("linear", "logn",  
etc.), units (all SIs, many US, perhaps imperials, ect), calendars  
(Islamic, Chinese, Hebrew, etc) and clocks (12h, 24h, etc.)

Introduction:

Overall, on these elements I think we should carefully consider the  
attempt to heuristically parse the contents of the element instead of  
always requiring explicit values. I think it is a commendable goal to  
accomplish something like this, but it is just too ambitious right  
now. Granted, the algorithm applied is rigid and determinist enough.  
However, it places all of the burden on authors to ensure their prose  
is conforming to a rather complex algorithm. This to me is the  
opposite way we would want heuristics to work. That proper computer  
heuristics would allow authors to write in natural ways and the  
heuristics would nearly always comprehend or parse out the needed  
meaning precisely. Instead, these algorithms do the opposite,  
requiring authors apply heuristics to the algorithm in order to write  
prose in just such a way that the algorithm sets the state of the  
element correctly.

In addition, the use of the attributes is simple and straightforward  
and the possibility of author error or misunderstanding is quite  
small. The only problem here is that the state of the element could  
become out of synch with the prose contents of the element (is this  
meant as fallback? I couldn't tell if it had other presentational  
uses.). If we really want to achieve consistency between the content   
of the element and the state of its properties, we should define  
additional elements that take precedence over the values when  
properly used. That is instead of

<meter class='applausometer' value='121' min='0' max='3225' low='125'  
high='3002' optimum='1228' / >

OR (which I believe would return an error from the ratio algorithm)

<meter class='applausometer'  min='0' low='125' high='3002'  
optimum='1228' / >On a scale of  zero to 3225 the performer only got  
an applause rating of 121: just shy of the One hundred and twenty- 
five threshold needed to avoid a tomato bombardment, no where near  
the One thousand two hundred and twenty-size needed to advance to the  
next round, and entirely out of reach of the 3002 needed to win  
outright.</meter>

Simply have this (which could work as fallback, encodes unambiguous  
properties for the element and not cause any errors):

<meter class='applausometer'  >
On a scale of  <min>0</min> to  < max>3225</max> the performer only  
got an applause rating of < value>121</ value >: just shy of the   
<low>125</low> threshold needed to avoid a tomato bombardment, no  
where near the <optimum>1228</optimum> needed to advance to the next  
round, and entirely out of reach of the <high>3002</high> needed to  
win outright.
</meter>

The current draft actually prevents the METER and PROGRESS elements  
from containing semantically rich fallback or even plain text  
fallback that contains complete information about the element.  
Perhaps these elements do not require fallback, however it's  
difficult to see what advantage there is for authors in writing  
pseudo natural prose inside the element.

The suggestion that we consider avoiding any parsing of the plain  
text contents of these elements to set their state applies to all  
three of the elements I discuss in this review: <time>, <meter> and  
<progress>. To me this is a fundamental part of HTML and the semantic  
web, in that we want to provide as many facilities for authors as we  
reasonably can so that authors can encode unambiguous semantics in  
the document at the time of conception rather than relying on later  
heuristic determinations when the original author may no longer be  
available for redirection.

TIME

Would it be useful to have a DOM attribute for dateTime that was not  
simply the value of the content attribute? After all that attribute  
would be accessible through getAttribute anyway. Instead returning a  
DOM string with the complete standardized timestamp (or through an  
additional  DOM attribute), would be useful . This way DOM calls  
could provide the actual dateTime associated with the element rather  
than simply the attribute value (which may be just a time or just a  
date). A name different than the content attribute might help keep it  
clear that the value does not reflect the content attribute (e.g.,  
dDateTime).

Right now it would be very difficult for authors to glean from the  
draft how to specify a datetime attribute from this subsection. This  
subsection should contain a short author informative explanation of  
how to specify a date, a time,  and a datetime (including with an  
offset) without authors needing to turn to the linked subsection on  
parsing datetimes This should normatively refer to an RFC;  provide a  
quick syntax explanation and a quick example (at the start of the  
subsection).

Also consider adding something like:
"Authors should include a UTC offset whenever including time  
information on a  <time> element. The user's (and the server's) time- 
zone will often be irrelevant for calculating an accurate local-time.  
Including an offset even when it is 0 makes it unambiguous that the  
time is expressed in UTC."

Consider adding a @calendar attribute to provide a presentational  
hint regarding the calendar. We may alternatively want to explore  
providing other calendar dates within this element. However, at the  
very least we could allow this element to store dates in a standard  
Gregorian format, but include the calendar as a QName along with the  
datetime data. While the issue of which calendar a date is associate  
with may ultimately be determined by a combination of semantic  
document data, style document data and even localization preferences  
of the user, it strikes me that this sort of information should be  
optionally a part of the semantic document. Similarly, we could add a  
clock attribute to indicate an author's preference for a clock (12- 
hour / 24-hour).

As I described above, I do not think it is worthwhile for us to  
provide an alternate unstructured method for author's to specify the  
date, time and offset  properties of this element. The datetime  
attribute is simple and straightforward and less susceptible to error  
or misunderstanding. The value of datetime could be expanded to  
support more date and time related concepts similar to the  
comprehensive datatypes provided through XML schema definition[1]  
data types (such as dates without years, years alone, etc.). In this  
sense, we should consider making time a canonically empty element  
whose presentation is well defined for various devices and media  
(alternatively, the element's contents might contain a date in a pre- 
specified format). Even better would be for HTML5 to add either a  
single <data oftype='' > or a comprehensive suite of elements to  
structurally present many different data types in markup (times and  
dates being just a few among many). A proposal for a <data> element  
follows.

Proposed DATA:

As I described in my review of TIME above, we should consider adding  
a DATA — or a similarly and appropriately named element — to  
contain one of many rigidly defined data types within a single  
canonically empty or otherwise rigidly and simply defined content  
model element. For example, the content model for the element could  
be a text-only representation of the data in a specified form  
(according to a particular RFC or the XSD recommendation).

proposed language/

Element-specific attributes:
 ofType (QNAME; required;)
 ??value (STRING determined by XSD)?? [this could instead be in the  
contents of the element]

Authors use the data element to include strictly defined data of  
specific type within their document. Such rigorous data types can  
then be presented in alternate ways depending on the author's or  
user's preference or localized cultural conventions. Authors must  
include a QName value in the @ofType attribute. Within the text/html  
serialization, names without a prefix will be interpreted as names  
from the XML Schema Definition (XSD) namespace,now  incorporated into  
the HTML namespace as of XSD (version *). Authors using the XML  
serialization may also specify data types from the HTML namespace,  
or —  by using XML namespaces — from the current XSD  
recommendation or  from another namespace entirely.

UAs should provide a suitable presentation of data types, perhaps  
drawing on a user's system preferences to localize data when  
possible. When the UA encounters a data type it is unfamiliar with  
and cannot associate  any stylesheet data to present the data, the UA  
should present the data value exactly as it appears as the string  
contents of the DATA element.

In the future, CSS or other style languages may provide a mechanism  
to alter the presentation of such data. For example a style language  
might make it possible to express long dates or short dates. A UA  
would then be able to combine stylesheet data with user preferences  
to present a  long date as either March 12, 2010 or 12 March 2010  
depending on the user's localization settings.

/end propose language


METER

For the paragraph on authoring requirements, consider making  
reference to the subsection on ratios in the microformats section and  
make clear that attribute values take precedence over element  
contents values to help readers understand how the numbers and ratios  
in the attribute's values and element's contents will be processed  
(if we decide to keep the model of processing the element's contents  
at all). Also change the wording to be more conformance-criteria-like  
language. If we indeed keep the proposal of processing the contents  
of the element when the attribute values are missing, the prose  
should make that clear.

proposed language/

Whether providing numbers through the meter element's attributes or  
through the meter element's contents, authors must ensure values for  
these properties are numbers comprised of ASCII numeric characters 0  
– 9 (U+0030 — U+0039) or an ASCII decimal-point U-002E) only.  
Numbers must not contain thousands separators of any kind nor contain  
any other Unicode numbers (numerals) or any other Unicode decimal  
characters.

If the author omits either or both of the attributes @value or @max,  
the author may instead provide those values through the contents of  
the element. The author may do this in one of the following three ways:

1) ensure the only two numbers amidst the text contents of the meter  
element are the actual value and the maximum value of the meter  
element where the maximum value must be greater than or equal to the  
actual value;
2) provide only the actual value of the meter element by ensuring the  
only number in the contents of the meter element is the actual value  
of the meter; or
3) provide the actual value as a proportion of:
 A) one hundred (100) by using one of these denominator punctuation  
characters:
   i) "%", "Percent Sign" (U+0025),
  ii) "٪", "Arabic Percent Sign" (U+066A),
  iii) "﹪", "Small Percent Sign (U+FE6A),
  iv) "％", "Fullwidth Percent Sign" (U+FF05 );
 B) one thousand (1,000) by using the denominator punctuation  
character, "‰", "Per Mille Sign" (U+2030);
 C) ten thousand (10,000) by using the denominator punctuation  
characters, ‱ "Per Ten Thousand Sign" (U+2031)

By including only one number and one of the above denominator  
punctuation characters. The actual value will be set as the only  
number within the element and the maximum for the meter will be set  
to the appropriate number corresponding to the above listed  
denominator punctuation characters.

To include the actual value in the contents of the meter element it  
an author must include no other numbers and at most one denominator  
punctuation character within the contents of the METER element. The  
actual value must be the lower of the two numbers and the maximum  
value must be the higher of the two numbers.

However, attribute values will always take precedence over the  
numbers parsed from the element's contents.
/ end proposed language

Consider removing the Small Percent Sign (U+FE68) and the Fullwidth  
Percent Sign (U+FF05). These characters are both compatibility  
characters and their use is discouraged by Unicode in most  
circumstances. They are included in Unicode for legacy support for  
CJK vertical text layout. Vertical text layout in HTML will hopefully  
be handled through upcoming CSS enhancements and vertical text layout  
is properly treated as a presentational issue outside the scope of HTML.

Also consider changing the draft to support all Unicode numeric  
digits (Unicode general category Nd) and perhaps other decimal  
characters too (currently only Arabic decimal separator U+066B;  
though this might be useful in any script as an unambiguous decimal  
character if glyph substitution is supported properly and it is works  
given the intricacies of the Unicode bidirectional algorithm).

For the maximum value, perhaps instead of 1 as the fallback value,  
consider changing the algorithm to use the next number with another  
decimal-place after the actual value. For example: 17 => 100; 225 =>  
1000, 0.85 => 1

Consider adding an attribute such as "scale" with values such as  
"linear", "log10", "logn", "expn", etc. The default would be "linear"  
and behave as the draft currently specifies. Perhaps advise UAs that  
they must support the  "linear" value and should support all values.

The meter element does not support ticks or scale labels. consider  
adding that. Either the @ticks and @labels attributes could take, as  
a value, a list of commas (or space) separate numbers, for example.

The meter element does not fully support labels as it is currently  
specified, but instead reduces everything into a proportion of one.  
Consider adding more flexibility. For instance, this algorithm does  
not appear to take advantage of the extra information from the ratio  
parsing algorithm. For example, if @max is set to 1000 and the  
contents of the element is 17%, the algorithm does not set the @value  
attribute to 170  accordingly (if I'm reading that correctly).

Consider making mention of potential styling mechanisms that are  
beyond the scope of HTML5. Styling mechanisms might include ticks and  
marks instead of including that information on the element. Perhaps  
that is something that should be available in either place (styling  
document and semantic document).

For a maximum value, what about the case where the actual value is  
greater than the maximum value. Especially when parsing this out of  
the content off the element, the values will be misinterpreted.. If  
we maintain this parsing authors should be made aware that the larger  
number will always be the maximum and the smaller value will always  
be the actual value regardless of what the prose say. Maximum value  
for a meter tends to imply the maximum value the meter can measure  
and not necessarily the maximum value possible for the magnitude  
being measured.

Again, as I said in the introduction, I think we should discard this  
attempt to parse out meaning from an author's contents of this  
element. HTML should be much more about encouraging authors to  
explicitly encode meaning unambiguously. We could consider doing so  
through explicit elements that would then take precedence over the  
attribute values on the mater element. However, as it is currently  
specified, the algorithm would not even allow a full fallback  
explanation within the contents of the element because additional  
numbers would cause an error returned from the ratio algorithm.

The case where maximum is set and textContent parsing returns a  
single number and a denominator punctuation character should be  
handled by setting the actual value to multiplicative product of the  
percentage and the maximum value, to allow a progress bar to present  
the non-percentage values as an optional presentation of the progress  
bar (similarly for PROGRESS).

If I'm reading the algorithms correctly it appears that the case of a  
value set above 1 without a maximum value set is not handled properly  
(AFAICT).

RECOMMENDED CHANGES FOR SUPPORTING EXISTING ALGORITHM:

(If we decide to keep support for unstructured  text to set the meter  
element's maximum and actual value properties)

Add "linear" to the following paragraph and consider striking ", or a  
fractional value" like:
"The meter element represents a scalar measurement within a known  
__linear__ range--, or a fractional value--; for example disk usage,  
the relevance of a query result, or the fraction of a voting  
population to have selected a particular candidate."

Consider breaking the algorithm into hierarchically arranged  
segments. As a notational convention consider numbering sequential  
steps and lettering mutually exclusive states. The following is my  
attempt to rewrite the meter algorithm following those conventions:

Determine the minimum value:
 A) If the min attribute is specified and a value could be parsed out  
of it, then the minimum value is that value.
 B) Otherwise, the minimum value is zero.

Determine the maximum value:
1 A) If the max attribute is specified and a value could be parsed  
out of it, the maximum value is that value
 B) Otherwise, if the max attribute:
  a) is specified but no value could be parsed out of it, or
  b) if it was not specified, but either or both of the min or value  
attributes were specified,
 then the maximum value is 1.
 C) Otherwise, none of the max, min, and value attributes were  
properly specified.
  So If the result of processing the textContent of the element was  
either:
   a)  nothing or
   b) just one number with no denominator punctuation character,
  then the maximum value is 1;
 D) if the result was one number but it had an associated denominator  
punctuation character, then the maximum value is the value associated  
with that denominator punctuation character; and finally,
 E) if there were two numbers parsed out of the textContent, then the  
maximum is the higher of those two numbers.
2) If the above __steps__--machinations-- result in a maximum value  
less than the minimum value, then the maximum value is actually the  
same as the minimum.

Determine the actual value:
1) A) If the value attribute is specified and a value could be parsed  
out of it, then that value is the actual value.
 B) If the value attribute is not specified but the max attribute is  
specified and the result of processing the textContent of the element  
was one number with no associated denominator punctuation character,  
then that number is the actual value
 C) If neither of the value and max attributes are specified, then,
  a) if the result of processing the textContent of the element was  
one number (with or without an associated denominator punctuation  
character), then that is the actual value, and
  b)  if the result of processing the textContent of the element was  
two numbers, then the actual value is the lower of the two numbers  
found.
 D) Otherwise, if none of the above apply, the actual value is zero (0).
2) A) If the above procedure results in an actual value less than the  
minimum value, then the actual value is actually the same as the  
minimum value.
 B) If, on the other hand, the result is an actual value greater than  
the maximum value, then the actual value is the maximum value.

Determine the low boundary:
1) A) If the low attribute is specified and a value could be parsed  
out of it, then the low boundary is that value.
 B) Otherwise, the low boundary is the same as the minimum value.
2) If the above results in a low boundary that is less than the  
minimum value, the low boundary is the minimum value.

Determine the high boundary:
1) A) If the high attribute is specified and a value could be parsed  
out of it, then the high boundary is that value.
 B) Otherwise, the high boundary is the same as the maximum value.
2) If the above results in a high boundary that is higher than the  
maximum value, the high boundary is the maximum value

Determine the optimum point:
1) A) If the optimum attribute is specified and a value could be  
parsed out of it, then the optimum point is that value.
 B) Otherwise, the optimum point is the midpoint between the minimum  
value and the maximum value.
2) If the optimum point is then less than the minimum value, then the  
optimum point is actually the same as the minimum value.
3) Similarly, if the optimum point is greater than the maximum value,  
then it is actually the maximum value instead.

Verify that all of the following weak inequalities true __for all  
specified values__:
 1) minimum value ≤ actual value ≤ maximum value
 2) minimum value ≤ low boundary ≤ high boundary ≤ maximum value
 3) minimum value ≤ optimum point ≤ maximum value

 From what I read in this algorithm it does not make proper use of  
the denominator punctuation character. That is, whenever the  
denominator punctuation character is used, it simply uses the  
associated value. Would it not be better (and carry all of the same  
information) to simply pass the value of the denominator character as  
the second number for the ratio processing algorithm. Or even better,  
consider using the added information of a one number and a  
denominator character to determine the actual value as a proportion  
of the max value. Again, though I think it would be better to permit  
semantically rich fallback content within this element and require  
authors to simply use the elements attributes to set these values. Or  
alternatively provide elements to set these values within the  
element's contents as I demonstrated in the introduction to this review.

PROGRESS

Consider changing the paragraph:
"The value attribute specifies how much of the task has been  
completed, and the max attribute specifies how much work the task  
requires in total. The units are arbitrary and not specified."
to:
The value attribute specifies how much of the task has been  
completed, and the max attribute specifies how much work the task  
requires in total. --The units are arbitrary and not specified--__The  
values are unitless. and not expressible through this element (i.e.,  
they are arbitrary and unspecified).

Or even better, consider adding a @units attribute to the element  
(and meter) to store that semantic. The units may be useful depending  
on the presentation of the progress bar. Authors can include the  
units elsewhere in the prose of the document, but it would also be  
useful to include them in the progress bar itself.

Consider adding a list of unit keywords to HTML to support use here  
on PROGRESS@units and METER@units and in other attributes.

On both meter and progress consider tying the DOM attributes to the  
state of the element rather than the content attributes. Those are  
already accessible through getAttribute, aren't they? If necessary  
create a newly named DOM attribute to get at the property state of  
the element (e.g., actualValue, dMax, dPosition).

The case where maximum is set and textContent parsing returns a  
single number and a denominator punctuation character should be  
handled by setting the actual value to multiplicative product of the  
percentage and the maximum value, to allow a progress bar to present  
the non-percentage values as an optional presentation of the progress  
bar (similarly for METER).


[1]: <http://www.w3.org/TR/xmlschema-2/#built-in-datatypes>
Received on Sunday, 22 July 2007 09:11:23 UTC