- From: Robert Burns <rob@robburns.com>
- Date: Sun, 22 Jul 2007 04:11:02 -0500
- To: public-html WG <public-html@w3.org>
Summary: • Editorial suggestions • Consider eliminating the quasi-heuristic extraction of ratios from the contents of the PROGRESS and METER elements. • Propose minor changes to ratio parsing algorithm and METER and PROGRESS contents algorithms if not eliminating textContent parsing ratio parsing • Proposed a new DATA element with @ofType attribute • Proposed new elements: <maxvalue>, <minvalue>, <highvalue>, <lowvalue>, <actualvalue>, <optimum> • Proposed new content attributes • TIME: @calendar, @clock • METER and PROGRESS: @units, @labels, @ticks • METER: @scale (with proposed QNames: "linear", "log10", "logn", "expn", etc.; default: "linear") • Propose new DOM attributes (probably better names could be iinvented) • TIME: dOMDateTime • METER and PROGRESS (if we continue to allow value specification in the element's contents): • actualValue, • dMaxValue • Recommend altering number related algorithms consider: • including all Unicode general category Nd (not just ASCII 0-9) • adding Arabic decimal separator (U+066B) • removing Unicode compatibility characters: • Small Percent Sign (U+FE6A) • Fullwidth Percent Sign (U+FF05) • Consider adding snapshot of XSD namespace to HTML for data type QNames. • Consider adding new QNames to HTML for scale ("linear", "logn", etc.), units (all SIs, many US, perhaps imperials, ect), calendars (Islamic, Chinese, Hebrew, etc) and clocks (12h, 24h, etc.) Introduction: Overall, on these elements I think we should carefully consider the attempt to heuristically parse the contents of the element instead of always requiring explicit values. I think it is a commendable goal to accomplish something like this, but it is just too ambitious right now. Granted, the algorithm applied is rigid and determinist enough. However, it places all of the burden on authors to ensure their prose is conforming to a rather complex algorithm. This to me is the opposite way we would want heuristics to work. That proper computer heuristics would allow authors to write in natural ways and the heuristics would nearly always comprehend or parse out the needed meaning precisely. Instead, these algorithms do the opposite, requiring authors apply heuristics to the algorithm in order to write prose in just such a way that the algorithm sets the state of the element correctly. In addition, the use of the attributes is simple and straightforward and the possibility of author error or misunderstanding is quite small. The only problem here is that the state of the element could become out of synch with the prose contents of the element (is this meant as fallback? I couldn't tell if it had other presentational uses.). If we really want to achieve consistency between the content of the element and the state of its properties, we should define additional elements that take precedence over the values when properly used. That is instead of <meter class='applausometer' value='121' min='0' max='3225' low='125' high='3002' optimum='1228' / > OR (which I believe would return an error from the ratio algorithm) <meter class='applausometer' min='0' low='125' high='3002' optimum='1228' / >On a scale of zero to 3225 the performer only got an applause rating of 121: just shy of the One hundred and twenty- five threshold needed to avoid a tomato bombardment, no where near the One thousand two hundred and twenty-size needed to advance to the next round, and entirely out of reach of the 3002 needed to win outright.</meter> Simply have this (which could work as fallback, encodes unambiguous properties for the element and not cause any errors): <meter class='applausometer' > On a scale of <min>0</min> to < max>3225</max> the performer only got an applause rating of < value>121</ value >: just shy of the <low>125</low> threshold needed to avoid a tomato bombardment, no where near the <optimum>1228</optimum> needed to advance to the next round, and entirely out of reach of the <high>3002</high> needed to win outright. </meter> The current draft actually prevents the METER and PROGRESS elements from containing semantically rich fallback or even plain text fallback that contains complete information about the element. Perhaps these elements do not require fallback, however it's difficult to see what advantage there is for authors in writing pseudo natural prose inside the element. The suggestion that we consider avoiding any parsing of the plain text contents of these elements to set their state applies to all three of the elements I discuss in this review: <time>, <meter> and <progress>. To me this is a fundamental part of HTML and the semantic web, in that we want to provide as many facilities for authors as we reasonably can so that authors can encode unambiguous semantics in the document at the time of conception rather than relying on later heuristic determinations when the original author may no longer be available for redirection. TIME Would it be useful to have a DOM attribute for dateTime that was not simply the value of the content attribute? After all that attribute would be accessible through getAttribute anyway. Instead returning a DOM string with the complete standardized timestamp (or through an additional DOM attribute), would be useful . This way DOM calls could provide the actual dateTime associated with the element rather than simply the attribute value (which may be just a time or just a date). A name different than the content attribute might help keep it clear that the value does not reflect the content attribute (e.g., dDateTime). Right now it would be very difficult for authors to glean from the draft how to specify a datetime attribute from this subsection. This subsection should contain a short author informative explanation of how to specify a date, a time, and a datetime (including with an offset) without authors needing to turn to the linked subsection on parsing datetimes This should normatively refer to an RFC; provide a quick syntax explanation and a quick example (at the start of the subsection). Also consider adding something like: "Authors should include a UTC offset whenever including time information on a <time> element. The user's (and the server's) time- zone will often be irrelevant for calculating an accurate local-time. Including an offset even when it is 0 makes it unambiguous that the time is expressed in UTC." Consider adding a @calendar attribute to provide a presentational hint regarding the calendar. We may alternatively want to explore providing other calendar dates within this element. However, at the very least we could allow this element to store dates in a standard Gregorian format, but include the calendar as a QName along with the datetime data. While the issue of which calendar a date is associate with may ultimately be determined by a combination of semantic document data, style document data and even localization preferences of the user, it strikes me that this sort of information should be optionally a part of the semantic document. Similarly, we could add a clock attribute to indicate an author's preference for a clock (12- hour / 24-hour). As I described above, I do not think it is worthwhile for us to provide an alternate unstructured method for author's to specify the date, time and offset properties of this element. The datetime attribute is simple and straightforward and less susceptible to error or misunderstanding. The value of datetime could be expanded to support more date and time related concepts similar to the comprehensive datatypes provided through XML schema definition[1] data types (such as dates without years, years alone, etc.). In this sense, we should consider making time a canonically empty element whose presentation is well defined for various devices and media (alternatively, the element's contents might contain a date in a pre- specified format). Even better would be for HTML5 to add either a single <data oftype='' > or a comprehensive suite of elements to structurally present many different data types in markup (times and dates being just a few among many). A proposal for a <data> element follows. Proposed DATA: As I described in my review of TIME above, we should consider adding a DATA — or a similarly and appropriately named element — to contain one of many rigidly defined data types within a single canonically empty or otherwise rigidly and simply defined content model element. For example, the content model for the element could be a text-only representation of the data in a specified form (according to a particular RFC or the XSD recommendation). proposed language/ Element-specific attributes: ofType (QNAME; required;) ??value (STRING determined by XSD)?? [this could instead be in the contents of the element] Authors use the data element to include strictly defined data of specific type within their document. Such rigorous data types can then be presented in alternate ways depending on the author's or user's preference or localized cultural conventions. Authors must include a QName value in the @ofType attribute. Within the text/html serialization, names without a prefix will be interpreted as names from the XML Schema Definition (XSD) namespace,now incorporated into the HTML namespace as of XSD (version *). Authors using the XML serialization may also specify data types from the HTML namespace, or — by using XML namespaces — from the current XSD recommendation or from another namespace entirely. UAs should provide a suitable presentation of data types, perhaps drawing on a user's system preferences to localize data when possible. When the UA encounters a data type it is unfamiliar with and cannot associate any stylesheet data to present the data, the UA should present the data value exactly as it appears as the string contents of the DATA element. In the future, CSS or other style languages may provide a mechanism to alter the presentation of such data. For example a style language might make it possible to express long dates or short dates. A UA would then be able to combine stylesheet data with user preferences to present a long date as either March 12, 2010 or 12 March 2010 depending on the user's localization settings. /end propose language METER For the paragraph on authoring requirements, consider making reference to the subsection on ratios in the microformats section and make clear that attribute values take precedence over element contents values to help readers understand how the numbers and ratios in the attribute's values and element's contents will be processed (if we decide to keep the model of processing the element's contents at all). Also change the wording to be more conformance-criteria-like language. If we indeed keep the proposal of processing the contents of the element when the attribute values are missing, the prose should make that clear. proposed language/ Whether providing numbers through the meter element's attributes or through the meter element's contents, authors must ensure values for these properties are numbers comprised of ASCII numeric characters 0 – 9 (U+0030 — U+0039) or an ASCII decimal-point U-002E) only. Numbers must not contain thousands separators of any kind nor contain any other Unicode numbers (numerals) or any other Unicode decimal characters. If the author omits either or both of the attributes @value or @max, the author may instead provide those values through the contents of the element. The author may do this in one of the following three ways: 1) ensure the only two numbers amidst the text contents of the meter element are the actual value and the maximum value of the meter element where the maximum value must be greater than or equal to the actual value; 2) provide only the actual value of the meter element by ensuring the only number in the contents of the meter element is the actual value of the meter; or 3) provide the actual value as a proportion of: A) one hundred (100) by using one of these denominator punctuation characters: i) "%", "Percent Sign" (U+0025), ii) "٪", "Arabic Percent Sign" (U+066A), iii) "﹪", "Small Percent Sign (U+FE6A), iv) "%", "Fullwidth Percent Sign" (U+FF05 ); B) one thousand (1,000) by using the denominator punctuation character, "‰", "Per Mille Sign" (U+2030); C) ten thousand (10,000) by using the denominator punctuation characters, ‱ "Per Ten Thousand Sign" (U+2031) By including only one number and one of the above denominator punctuation characters. The actual value will be set as the only number within the element and the maximum for the meter will be set to the appropriate number corresponding to the above listed denominator punctuation characters. To include the actual value in the contents of the meter element it an author must include no other numbers and at most one denominator punctuation character within the contents of the METER element. The actual value must be the lower of the two numbers and the maximum value must be the higher of the two numbers. However, attribute values will always take precedence over the numbers parsed from the element's contents. / end proposed language Consider removing the Small Percent Sign (U+FE68) and the Fullwidth Percent Sign (U+FF05). These characters are both compatibility characters and their use is discouraged by Unicode in most circumstances. They are included in Unicode for legacy support for CJK vertical text layout. Vertical text layout in HTML will hopefully be handled through upcoming CSS enhancements and vertical text layout is properly treated as a presentational issue outside the scope of HTML. Also consider changing the draft to support all Unicode numeric digits (Unicode general category Nd) and perhaps other decimal characters too (currently only Arabic decimal separator U+066B; though this might be useful in any script as an unambiguous decimal character if glyph substitution is supported properly and it is works given the intricacies of the Unicode bidirectional algorithm). For the maximum value, perhaps instead of 1 as the fallback value, consider changing the algorithm to use the next number with another decimal-place after the actual value. For example: 17 => 100; 225 => 1000, 0.85 => 1 Consider adding an attribute such as "scale" with values such as "linear", "log10", "logn", "expn", etc. The default would be "linear" and behave as the draft currently specifies. Perhaps advise UAs that they must support the "linear" value and should support all values. The meter element does not support ticks or scale labels. consider adding that. Either the @ticks and @labels attributes could take, as a value, a list of commas (or space) separate numbers, for example. The meter element does not fully support labels as it is currently specified, but instead reduces everything into a proportion of one. Consider adding more flexibility. For instance, this algorithm does not appear to take advantage of the extra information from the ratio parsing algorithm. For example, if @max is set to 1000 and the contents of the element is 17%, the algorithm does not set the @value attribute to 170 accordingly (if I'm reading that correctly). Consider making mention of potential styling mechanisms that are beyond the scope of HTML5. Styling mechanisms might include ticks and marks instead of including that information on the element. Perhaps that is something that should be available in either place (styling document and semantic document). For a maximum value, what about the case where the actual value is greater than the maximum value. Especially when parsing this out of the content off the element, the values will be misinterpreted.. If we maintain this parsing authors should be made aware that the larger number will always be the maximum and the smaller value will always be the actual value regardless of what the prose say. Maximum value for a meter tends to imply the maximum value the meter can measure and not necessarily the maximum value possible for the magnitude being measured. Again, as I said in the introduction, I think we should discard this attempt to parse out meaning from an author's contents of this element. HTML should be much more about encouraging authors to explicitly encode meaning unambiguously. We could consider doing so through explicit elements that would then take precedence over the attribute values on the mater element. However, as it is currently specified, the algorithm would not even allow a full fallback explanation within the contents of the element because additional numbers would cause an error returned from the ratio algorithm. The case where maximum is set and textContent parsing returns a single number and a denominator punctuation character should be handled by setting the actual value to multiplicative product of the percentage and the maximum value, to allow a progress bar to present the non-percentage values as an optional presentation of the progress bar (similarly for PROGRESS). If I'm reading the algorithms correctly it appears that the case of a value set above 1 without a maximum value set is not handled properly (AFAICT). RECOMMENDED CHANGES FOR SUPPORTING EXISTING ALGORITHM: (If we decide to keep support for unstructured text to set the meter element's maximum and actual value properties) Add "linear" to the following paragraph and consider striking ", or a fractional value" like: "The meter element represents a scalar measurement within a known __linear__ range--, or a fractional value--; for example disk usage, the relevance of a query result, or the fraction of a voting population to have selected a particular candidate." Consider breaking the algorithm into hierarchically arranged segments. As a notational convention consider numbering sequential steps and lettering mutually exclusive states. The following is my attempt to rewrite the meter algorithm following those conventions: Determine the minimum value: A) If the min attribute is specified and a value could be parsed out of it, then the minimum value is that value. B) Otherwise, the minimum value is zero. Determine the maximum value: 1 A) If the max attribute is specified and a value could be parsed out of it, the maximum value is that value B) Otherwise, if the max attribute: a) is specified but no value could be parsed out of it, or b) if it was not specified, but either or both of the min or value attributes were specified, then the maximum value is 1. C) Otherwise, none of the max, min, and value attributes were properly specified. So If the result of processing the textContent of the element was either: a) nothing or b) just one number with no denominator punctuation character, then the maximum value is 1; D) if the result was one number but it had an associated denominator punctuation character, then the maximum value is the value associated with that denominator punctuation character; and finally, E) if there were two numbers parsed out of the textContent, then the maximum is the higher of those two numbers. 2) If the above __steps__--machinations-- result in a maximum value less than the minimum value, then the maximum value is actually the same as the minimum. Determine the actual value: 1) A) If the value attribute is specified and a value could be parsed out of it, then that value is the actual value. B) If the value attribute is not specified but the max attribute is specified and the result of processing the textContent of the element was one number with no associated denominator punctuation character, then that number is the actual value C) If neither of the value and max attributes are specified, then, a) if the result of processing the textContent of the element was one number (with or without an associated denominator punctuation character), then that is the actual value, and b) if the result of processing the textContent of the element was two numbers, then the actual value is the lower of the two numbers found. D) Otherwise, if none of the above apply, the actual value is zero (0). 2) A) If the above procedure results in an actual value less than the minimum value, then the actual value is actually the same as the minimum value. B) If, on the other hand, the result is an actual value greater than the maximum value, then the actual value is the maximum value. Determine the low boundary: 1) A) If the low attribute is specified and a value could be parsed out of it, then the low boundary is that value. B) Otherwise, the low boundary is the same as the minimum value. 2) If the above results in a low boundary that is less than the minimum value, the low boundary is the minimum value. Determine the high boundary: 1) A) If the high attribute is specified and a value could be parsed out of it, then the high boundary is that value. B) Otherwise, the high boundary is the same as the maximum value. 2) If the above results in a high boundary that is higher than the maximum value, the high boundary is the maximum value Determine the optimum point: 1) A) If the optimum attribute is specified and a value could be parsed out of it, then the optimum point is that value. B) Otherwise, the optimum point is the midpoint between the minimum value and the maximum value. 2) If the optimum point is then less than the minimum value, then the optimum point is actually the same as the minimum value. 3) Similarly, if the optimum point is greater than the maximum value, then it is actually the maximum value instead. Verify that all of the following weak inequalities true __for all specified values__: 1) minimum value ≤ actual value ≤ maximum value 2) minimum value ≤ low boundary ≤ high boundary ≤ maximum value 3) minimum value ≤ optimum point ≤ maximum value From what I read in this algorithm it does not make proper use of the denominator punctuation character. That is, whenever the denominator punctuation character is used, it simply uses the associated value. Would it not be better (and carry all of the same information) to simply pass the value of the denominator character as the second number for the ratio processing algorithm. Or even better, consider using the added information of a one number and a denominator character to determine the actual value as a proportion of the max value. Again, though I think it would be better to permit semantically rich fallback content within this element and require authors to simply use the elements attributes to set these values. Or alternatively provide elements to set these values within the element's contents as I demonstrated in the introduction to this review. PROGRESS Consider changing the paragraph: "The value attribute specifies how much of the task has been completed, and the max attribute specifies how much work the task requires in total. The units are arbitrary and not specified." to: The value attribute specifies how much of the task has been completed, and the max attribute specifies how much work the task requires in total. --The units are arbitrary and not specified--__The values are unitless. and not expressible through this element (i.e., they are arbitrary and unspecified). Or even better, consider adding a @units attribute to the element (and meter) to store that semantic. The units may be useful depending on the presentation of the progress bar. Authors can include the units elsewhere in the prose of the document, but it would also be useful to include them in the progress bar itself. Consider adding a list of unit keywords to HTML to support use here on PROGRESS@units and METER@units and in other attributes. On both meter and progress consider tying the DOM attributes to the state of the element rather than the content attributes. Those are already accessible through getAttribute, aren't they? If necessary create a newly named DOM attribute to get at the property state of the element (e.g., actualValue, dMax, dPosition). The case where maximum is set and textContent parsing returns a single number and a denominator punctuation character should be handled by setting the actual value to multiplicative product of the percentage and the maximum value, to allow a progress bar to present the non-percentage values as an optional presentation of the progress bar (similarly for METER). [1]: <http://www.w3.org/TR/xmlschema-2/#built-in-datatypes>
Received on Sunday, 22 July 2007 09:11:23 UTC