- From: Robert Burns <rob@robburns.com>
- Date: Sun, 22 Jul 2007 04:11:02 -0500
- To: public-html WG <public-html@w3.org>
Summary:
• Editorial suggestions
• Consider eliminating the quasi-heuristic extraction of ratios from
the contents of the PROGRESS and METER elements.
• Propose minor changes to ratio parsing algorithm and METER and
PROGRESS contents algorithms if not eliminating textContent parsing
ratio parsing
• Proposed a new DATA element with @ofType attribute
• Proposed new elements: <maxvalue>, <minvalue>, <highvalue>,
<lowvalue>, <actualvalue>, <optimum>
• Proposed new content attributes
• TIME: @calendar, @clock
• METER and PROGRESS: @units, @labels, @ticks
• METER: @scale (with proposed QNames: "linear", "log10", "logn",
"expn", etc.; default: "linear")
• Propose new DOM attributes (probably better names could be
iinvented)
• TIME: dOMDateTime
• METER and PROGRESS (if we continue to allow value specification
in the element's contents):
• actualValue,
• dMaxValue
• Recommend altering number related algorithms consider:
• including all Unicode general category Nd (not just ASCII 0-9)
• adding Arabic decimal separator (U+066B)
• removing Unicode compatibility characters:
• Small Percent Sign (U+FE6A)
• Fullwidth Percent Sign (U+FF05)
• Consider adding snapshot of XSD namespace to HTML for data type
QNames.
• Consider adding new QNames to HTML for scale ("linear", "logn",
etc.), units (all SIs, many US, perhaps imperials, ect), calendars
(Islamic, Chinese, Hebrew, etc) and clocks (12h, 24h, etc.)
Introduction:
Overall, on these elements I think we should carefully consider the
attempt to heuristically parse the contents of the element instead of
always requiring explicit values. I think it is a commendable goal to
accomplish something like this, but it is just too ambitious right
now. Granted, the algorithm applied is rigid and determinist enough.
However, it places all of the burden on authors to ensure their prose
is conforming to a rather complex algorithm. This to me is the
opposite way we would want heuristics to work. That proper computer
heuristics would allow authors to write in natural ways and the
heuristics would nearly always comprehend or parse out the needed
meaning precisely. Instead, these algorithms do the opposite,
requiring authors apply heuristics to the algorithm in order to write
prose in just such a way that the algorithm sets the state of the
element correctly.
In addition, the use of the attributes is simple and straightforward
and the possibility of author error or misunderstanding is quite
small. The only problem here is that the state of the element could
become out of synch with the prose contents of the element (is this
meant as fallback? I couldn't tell if it had other presentational
uses.). If we really want to achieve consistency between the content
of the element and the state of its properties, we should define
additional elements that take precedence over the values when
properly used. That is instead of
<meter class='applausometer' value='121' min='0' max='3225' low='125'
high='3002' optimum='1228' / >
OR (which I believe would return an error from the ratio algorithm)
<meter class='applausometer' min='0' low='125' high='3002'
optimum='1228' / >On a scale of zero to 3225 the performer only got
an applause rating of 121: just shy of the One hundred and twenty-
five threshold needed to avoid a tomato bombardment, no where near
the One thousand two hundred and twenty-size needed to advance to the
next round, and entirely out of reach of the 3002 needed to win
outright.</meter>
Simply have this (which could work as fallback, encodes unambiguous
properties for the element and not cause any errors):
<meter class='applausometer' >
On a scale of <min>0</min> to < max>3225</max> the performer only
got an applause rating of < value>121</ value >: just shy of the
<low>125</low> threshold needed to avoid a tomato bombardment, no
where near the <optimum>1228</optimum> needed to advance to the next
round, and entirely out of reach of the <high>3002</high> needed to
win outright.
</meter>
The current draft actually prevents the METER and PROGRESS elements
from containing semantically rich fallback or even plain text
fallback that contains complete information about the element.
Perhaps these elements do not require fallback, however it's
difficult to see what advantage there is for authors in writing
pseudo natural prose inside the element.
The suggestion that we consider avoiding any parsing of the plain
text contents of these elements to set their state applies to all
three of the elements I discuss in this review: <time>, <meter> and
<progress>. To me this is a fundamental part of HTML and the semantic
web, in that we want to provide as many facilities for authors as we
reasonably can so that authors can encode unambiguous semantics in
the document at the time of conception rather than relying on later
heuristic determinations when the original author may no longer be
available for redirection.
TIME
Would it be useful to have a DOM attribute for dateTime that was not
simply the value of the content attribute? After all that attribute
would be accessible through getAttribute anyway. Instead returning a
DOM string with the complete standardized timestamp (or through an
additional DOM attribute), would be useful . This way DOM calls
could provide the actual dateTime associated with the element rather
than simply the attribute value (which may be just a time or just a
date). A name different than the content attribute might help keep it
clear that the value does not reflect the content attribute (e.g.,
dDateTime).
Right now it would be very difficult for authors to glean from the
draft how to specify a datetime attribute from this subsection. This
subsection should contain a short author informative explanation of
how to specify a date, a time, and a datetime (including with an
offset) without authors needing to turn to the linked subsection on
parsing datetimes This should normatively refer to an RFC; provide a
quick syntax explanation and a quick example (at the start of the
subsection).
Also consider adding something like:
"Authors should include a UTC offset whenever including time
information on a <time> element. The user's (and the server's) time-
zone will often be irrelevant for calculating an accurate local-time.
Including an offset even when it is 0 makes it unambiguous that the
time is expressed in UTC."
Consider adding a @calendar attribute to provide a presentational
hint regarding the calendar. We may alternatively want to explore
providing other calendar dates within this element. However, at the
very least we could allow this element to store dates in a standard
Gregorian format, but include the calendar as a QName along with the
datetime data. While the issue of which calendar a date is associate
with may ultimately be determined by a combination of semantic
document data, style document data and even localization preferences
of the user, it strikes me that this sort of information should be
optionally a part of the semantic document. Similarly, we could add a
clock attribute to indicate an author's preference for a clock (12-
hour / 24-hour).
As I described above, I do not think it is worthwhile for us to
provide an alternate unstructured method for author's to specify the
date, time and offset properties of this element. The datetime
attribute is simple and straightforward and less susceptible to error
or misunderstanding. The value of datetime could be expanded to
support more date and time related concepts similar to the
comprehensive datatypes provided through XML schema definition[1]
data types (such as dates without years, years alone, etc.). In this
sense, we should consider making time a canonically empty element
whose presentation is well defined for various devices and media
(alternatively, the element's contents might contain a date in a pre-
specified format). Even better would be for HTML5 to add either a
single <data oftype='' > or a comprehensive suite of elements to
structurally present many different data types in markup (times and
dates being just a few among many). A proposal for a <data> element
follows.
Proposed DATA:
As I described in my review of TIME above, we should consider adding
a DATA — or a similarly and appropriately named element — to
contain one of many rigidly defined data types within a single
canonically empty or otherwise rigidly and simply defined content
model element. For example, the content model for the element could
be a text-only representation of the data in a specified form
(according to a particular RFC or the XSD recommendation).
proposed language/
Element-specific attributes:
ofType (QNAME; required;)
??value (STRING determined by XSD)?? [this could instead be in the
contents of the element]
Authors use the data element to include strictly defined data of
specific type within their document. Such rigorous data types can
then be presented in alternate ways depending on the author's or
user's preference or localized cultural conventions. Authors must
include a QName value in the @ofType attribute. Within the text/html
serialization, names without a prefix will be interpreted as names
from the XML Schema Definition (XSD) namespace,now incorporated into
the HTML namespace as of XSD (version *). Authors using the XML
serialization may also specify data types from the HTML namespace,
or — by using XML namespaces — from the current XSD
recommendation or from another namespace entirely.
UAs should provide a suitable presentation of data types, perhaps
drawing on a user's system preferences to localize data when
possible. When the UA encounters a data type it is unfamiliar with
and cannot associate any stylesheet data to present the data, the UA
should present the data value exactly as it appears as the string
contents of the DATA element.
In the future, CSS or other style languages may provide a mechanism
to alter the presentation of such data. For example a style language
might make it possible to express long dates or short dates. A UA
would then be able to combine stylesheet data with user preferences
to present a long date as either March 12, 2010 or 12 March 2010
depending on the user's localization settings.
/end propose language
METER
For the paragraph on authoring requirements, consider making
reference to the subsection on ratios in the microformats section and
make clear that attribute values take precedence over element
contents values to help readers understand how the numbers and ratios
in the attribute's values and element's contents will be processed
(if we decide to keep the model of processing the element's contents
at all). Also change the wording to be more conformance-criteria-like
language. If we indeed keep the proposal of processing the contents
of the element when the attribute values are missing, the prose
should make that clear.
proposed language/
Whether providing numbers through the meter element's attributes or
through the meter element's contents, authors must ensure values for
these properties are numbers comprised of ASCII numeric characters 0
– 9 (U+0030 — U+0039) or an ASCII decimal-point U-002E) only.
Numbers must not contain thousands separators of any kind nor contain
any other Unicode numbers (numerals) or any other Unicode decimal
characters.
If the author omits either or both of the attributes @value or @max,
the author may instead provide those values through the contents of
the element. The author may do this in one of the following three ways:
1) ensure the only two numbers amidst the text contents of the meter
element are the actual value and the maximum value of the meter
element where the maximum value must be greater than or equal to the
actual value;
2) provide only the actual value of the meter element by ensuring the
only number in the contents of the meter element is the actual value
of the meter; or
3) provide the actual value as a proportion of:
A) one hundred (100) by using one of these denominator punctuation
characters:
i) "%", "Percent Sign" (U+0025),
ii) "٪", "Arabic Percent Sign" (U+066A),
iii) "﹪", "Small Percent Sign (U+FE6A),
iv) "%", "Fullwidth Percent Sign" (U+FF05 );
B) one thousand (1,000) by using the denominator punctuation
character, "‰", "Per Mille Sign" (U+2030);
C) ten thousand (10,000) by using the denominator punctuation
characters, ‱ "Per Ten Thousand Sign" (U+2031)
By including only one number and one of the above denominator
punctuation characters. The actual value will be set as the only
number within the element and the maximum for the meter will be set
to the appropriate number corresponding to the above listed
denominator punctuation characters.
To include the actual value in the contents of the meter element it
an author must include no other numbers and at most one denominator
punctuation character within the contents of the METER element. The
actual value must be the lower of the two numbers and the maximum
value must be the higher of the two numbers.
However, attribute values will always take precedence over the
numbers parsed from the element's contents.
/ end proposed language
Consider removing the Small Percent Sign (U+FE68) and the Fullwidth
Percent Sign (U+FF05). These characters are both compatibility
characters and their use is discouraged by Unicode in most
circumstances. They are included in Unicode for legacy support for
CJK vertical text layout. Vertical text layout in HTML will hopefully
be handled through upcoming CSS enhancements and vertical text layout
is properly treated as a presentational issue outside the scope of HTML.
Also consider changing the draft to support all Unicode numeric
digits (Unicode general category Nd) and perhaps other decimal
characters too (currently only Arabic decimal separator U+066B;
though this might be useful in any script as an unambiguous decimal
character if glyph substitution is supported properly and it is works
given the intricacies of the Unicode bidirectional algorithm).
For the maximum value, perhaps instead of 1 as the fallback value,
consider changing the algorithm to use the next number with another
decimal-place after the actual value. For example: 17 => 100; 225 =>
1000, 0.85 => 1
Consider adding an attribute such as "scale" with values such as
"linear", "log10", "logn", "expn", etc. The default would be "linear"
and behave as the draft currently specifies. Perhaps advise UAs that
they must support the "linear" value and should support all values.
The meter element does not support ticks or scale labels. consider
adding that. Either the @ticks and @labels attributes could take, as
a value, a list of commas (or space) separate numbers, for example.
The meter element does not fully support labels as it is currently
specified, but instead reduces everything into a proportion of one.
Consider adding more flexibility. For instance, this algorithm does
not appear to take advantage of the extra information from the ratio
parsing algorithm. For example, if @max is set to 1000 and the
contents of the element is 17%, the algorithm does not set the @value
attribute to 170 accordingly (if I'm reading that correctly).
Consider making mention of potential styling mechanisms that are
beyond the scope of HTML5. Styling mechanisms might include ticks and
marks instead of including that information on the element. Perhaps
that is something that should be available in either place (styling
document and semantic document).
For a maximum value, what about the case where the actual value is
greater than the maximum value. Especially when parsing this out of
the content off the element, the values will be misinterpreted.. If
we maintain this parsing authors should be made aware that the larger
number will always be the maximum and the smaller value will always
be the actual value regardless of what the prose say. Maximum value
for a meter tends to imply the maximum value the meter can measure
and not necessarily the maximum value possible for the magnitude
being measured.
Again, as I said in the introduction, I think we should discard this
attempt to parse out meaning from an author's contents of this
element. HTML should be much more about encouraging authors to
explicitly encode meaning unambiguously. We could consider doing so
through explicit elements that would then take precedence over the
attribute values on the mater element. However, as it is currently
specified, the algorithm would not even allow a full fallback
explanation within the contents of the element because additional
numbers would cause an error returned from the ratio algorithm.
The case where maximum is set and textContent parsing returns a
single number and a denominator punctuation character should be
handled by setting the actual value to multiplicative product of the
percentage and the maximum value, to allow a progress bar to present
the non-percentage values as an optional presentation of the progress
bar (similarly for PROGRESS).
If I'm reading the algorithms correctly it appears that the case of a
value set above 1 without a maximum value set is not handled properly
(AFAICT).
RECOMMENDED CHANGES FOR SUPPORTING EXISTING ALGORITHM:
(If we decide to keep support for unstructured text to set the meter
element's maximum and actual value properties)
Add "linear" to the following paragraph and consider striking ", or a
fractional value" like:
"The meter element represents a scalar measurement within a known
__linear__ range--, or a fractional value--; for example disk usage,
the relevance of a query result, or the fraction of a voting
population to have selected a particular candidate."
Consider breaking the algorithm into hierarchically arranged
segments. As a notational convention consider numbering sequential
steps and lettering mutually exclusive states. The following is my
attempt to rewrite the meter algorithm following those conventions:
Determine the minimum value:
A) If the min attribute is specified and a value could be parsed out
of it, then the minimum value is that value.
B) Otherwise, the minimum value is zero.
Determine the maximum value:
1 A) If the max attribute is specified and a value could be parsed
out of it, the maximum value is that value
B) Otherwise, if the max attribute:
a) is specified but no value could be parsed out of it, or
b) if it was not specified, but either or both of the min or value
attributes were specified,
then the maximum value is 1.
C) Otherwise, none of the max, min, and value attributes were
properly specified.
So If the result of processing the textContent of the element was
either:
a) nothing or
b) just one number with no denominator punctuation character,
then the maximum value is 1;
D) if the result was one number but it had an associated denominator
punctuation character, then the maximum value is the value associated
with that denominator punctuation character; and finally,
E) if there were two numbers parsed out of the textContent, then the
maximum is the higher of those two numbers.
2) If the above __steps__--machinations-- result in a maximum value
less than the minimum value, then the maximum value is actually the
same as the minimum.
Determine the actual value:
1) A) If the value attribute is specified and a value could be parsed
out of it, then that value is the actual value.
B) If the value attribute is not specified but the max attribute is
specified and the result of processing the textContent of the element
was one number with no associated denominator punctuation character,
then that number is the actual value
C) If neither of the value and max attributes are specified, then,
a) if the result of processing the textContent of the element was
one number (with or without an associated denominator punctuation
character), then that is the actual value, and
b) if the result of processing the textContent of the element was
two numbers, then the actual value is the lower of the two numbers
found.
D) Otherwise, if none of the above apply, the actual value is zero (0).
2) A) If the above procedure results in an actual value less than the
minimum value, then the actual value is actually the same as the
minimum value.
B) If, on the other hand, the result is an actual value greater than
the maximum value, then the actual value is the maximum value.
Determine the low boundary:
1) A) If the low attribute is specified and a value could be parsed
out of it, then the low boundary is that value.
B) Otherwise, the low boundary is the same as the minimum value.
2) If the above results in a low boundary that is less than the
minimum value, the low boundary is the minimum value.
Determine the high boundary:
1) A) If the high attribute is specified and a value could be parsed
out of it, then the high boundary is that value.
B) Otherwise, the high boundary is the same as the maximum value.
2) If the above results in a high boundary that is higher than the
maximum value, the high boundary is the maximum value
Determine the optimum point:
1) A) If the optimum attribute is specified and a value could be
parsed out of it, then the optimum point is that value.
B) Otherwise, the optimum point is the midpoint between the minimum
value and the maximum value.
2) If the optimum point is then less than the minimum value, then the
optimum point is actually the same as the minimum value.
3) Similarly, if the optimum point is greater than the maximum value,
then it is actually the maximum value instead.
Verify that all of the following weak inequalities true __for all
specified values__:
1) minimum value ≤ actual value ≤ maximum value
2) minimum value ≤ low boundary ≤ high boundary ≤ maximum value
3) minimum value ≤ optimum point ≤ maximum value
From what I read in this algorithm it does not make proper use of
the denominator punctuation character. That is, whenever the
denominator punctuation character is used, it simply uses the
associated value. Would it not be better (and carry all of the same
information) to simply pass the value of the denominator character as
the second number for the ratio processing algorithm. Or even better,
consider using the added information of a one number and a
denominator character to determine the actual value as a proportion
of the max value. Again, though I think it would be better to permit
semantically rich fallback content within this element and require
authors to simply use the elements attributes to set these values. Or
alternatively provide elements to set these values within the
element's contents as I demonstrated in the introduction to this review.
PROGRESS
Consider changing the paragraph:
"The value attribute specifies how much of the task has been
completed, and the max attribute specifies how much work the task
requires in total. The units are arbitrary and not specified."
to:
The value attribute specifies how much of the task has been
completed, and the max attribute specifies how much work the task
requires in total. --The units are arbitrary and not specified--__The
values are unitless. and not expressible through this element (i.e.,
they are arbitrary and unspecified).
Or even better, consider adding a @units attribute to the element
(and meter) to store that semantic. The units may be useful depending
on the presentation of the progress bar. Authors can include the
units elsewhere in the prose of the document, but it would also be
useful to include them in the progress bar itself.
Consider adding a list of unit keywords to HTML to support use here
on PROGRESS@units and METER@units and in other attributes.
On both meter and progress consider tying the DOM attributes to the
state of the element rather than the content attributes. Those are
already accessible through getAttribute, aren't they? If necessary
create a newly named DOM attribute to get at the property state of
the element (e.g., actualValue, dMax, dPosition).
The case where maximum is set and textContent parsing returns a
single number and a denominator punctuation character should be
handled by setting the actual value to multiplicative product of the
percentage and the maximum value, to allow a progress bar to present
the non-percentage values as an optional presentation of the progress
bar (similarly for METER).
[1]: <http://www.w3.org/TR/xmlschema-2/#built-in-datatypes>
Received on Sunday, 22 July 2007 09:11:23 UTC