[whatwg] Expanding datetime

-----Original Message-----
>From: WeBMartians <webmartians at verizon.net>
>
>Can the standard state that datetime strings support ISO 8601:2004 and leave it at that? (just a thought)

Considering the considerable number of formats that ISO 8601:2004 encompasses, I don't think that would be a good idea.  The following format possibilities for specifying instants in time, while acceptable to ISO 8601:2004 would not be worth adding to HTML 5 under any use case I can imagine.

I really can't see why they bothered with the century format except as supplemental field for pre Y2J applications that used a 2 digit year field.

Decimal minutes and hours are allowed in ISO 8601:2004, but lead to problems with rounding once one gets to increments smaller than 0.0001 min and 0.00001 h respectively, so should probably not be adopted.

The alternate ways of representing 1985-04-12: the ordinal date (1985-102) and the week date (1985-W15-5) don't offer anything that would justify the extra complexity that would be required of the parser.

In case the specification ever develops a use case for a comma separated list of datetime values, then the use of comma as an alternate to the period for the decimal point should not be allowed.

HTML 4 currently allows only:
####-##-##T##:##:##Z
####-##-##T##:##:##+##:##
####-##-##T##:##:##-##:##

These three formats are a subset of those suggested by the W3C datetime note [1] which itself is a subset of the choices available under ISO 8601.

The HTML 5 draft currently extends this to allow:
 * leaving out the seconds (valid under the W3C datetime note)
 * adding a decimal point and one or more digits for fractional seconds (valid under the W3C datetime note)
 * specifying just the date (valid under the W3C datetime note)
 * specifying just the time (valid under ISO 8601, but not the W3C datetime note)
 * replacing the "T" separator between the time and date with whitespace (NOT VALID under ISO 8601)
 * including some optional whitespace in some specific places (NOT VALID under ISO 8601)

Unless allowing the whitespace is needed to back standardize an existing implementation's handling of <ins>/<del>, it should be removed from the draft as it is not ISO 8601 compliant and complicates the job of the parser, though I grant it does improve the human readability slightly.

As a side issue, unless attribute values are restricted to ISO/IEC 656 characters, ISO 8601 appears to call for also accepting U+2010 HYPHEN as a separator between the fields of a date value, and U+2212 MINUS for the sign in a time zone designator or extended year representation.

The following is a list of steps for a parser that accepts a wide range of valid ISO 8601 datetime formats, assuming that extended years are of the format ?YYYYY and that at most 3 digits are allowed after the FULL STOP in the seconds.  Variables <date>, <time>, and <zone> are booleans for whether a date, time, or timezone was present in the designation.  Variables <year>, <month>, <day>, <hour>, <minute>, <second>, <millisecond>, <zonehour>, and <zoneminute> contain the correct results, but range checking for nonsense values as 2007-02-29 or even 2007-13-42 is not done, nor is converting them to value stored in the DOM.

0. Set <month> and <day> to 1; set <hour>, <minute>, <second>, and <millisecond> to 0; set <date>, <time>, and <timezone> to invalid; advance to the first non-whitespace character.
1. If the next character is not PLUS, MINUS, or HYPHEN-MINUS, skip to step 5.
2. If the next character is not PLUS, set <year> to -1, else set <year> to 1.
3. Advance 1 character; if the next five characters are not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
4. Set <date> to valid; multiply <year> by the value given by the next five characters; advance 5 characters; skip to step 7.
5. If the next four characters are not in the set DIGIT ZERO .. DIGIT NINE, skip to step 17.
6. Set <date> to valid; set <year> to the value given by the next four characters; advance 4 characters.
7. If the next character is whitespace, COMMA, or end of string, end as valid.
8. If the next character is not HYPHEN or HYPHEN-MINUS, abort as invalid.
9. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
10. Set <month> to the value given by the next two characters; advance 2 characters.
11. If the next character is whitespace, COMMA, or end of string, end as valid.
12. If the next character is not HYPHEN or HYPHEN-MINUS, abort as invalid.
13. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
14. Set <day> to the value given by the next two characters; advance 2 characters.
15. If the next character is whitespace, COMMA, or end of string, end as valid.
16. If the next character is not LATIN CAPITAL LETTER T abort as invalid.
17. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
18. Set <time> to valid; set <hour> to the value given by the next two characters; advance 2 characters.
19. If the next character is whitespace, COMMA, or end of string, end as valid.
20. If the next character is not COLON abort as invalid.
21. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
22. Set <minute> to the value given by the next two characters; advance 2 characters.
23. If the next character is whitespace, COMMA, or end of string, end as valid.
24. If the next character is not COLON abort as invalid.
25. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
26. Set <second> to the value given by the next two characters; advance 2 characters.
27. If the next character is whitespace, COMMA, or end of string, end as valid.
28. If the next character is not FULL STOP abort as invalid.
29. If the next character is not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
30. Set <millisecond> to 100 times the value given by the next character; advance 1 character.
31. If the next character is whitespace, COMMA, or end of string, end as valid. 
32. If the next character is PLUS. MINUS, HYPHEN-MINUS, or LATIN CAPITAL LETTER Z, skip to step 41.
33. If the next character is not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
34. Add to <millisecond> 10 times the value given by the next character; advance 1 character.
35. If the next character is whitespace, COMMA, or end of string, end as valid. 
36. If the next character is PLUS. MINUS, HYPHEN-MINUS, or LATIN CAPITAL LETTER Z, skip to step 41.
37. If the next character is not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
38. Add to <millisecond> the value given by the next character; advance 1 character.
39. If the next character is whitespace, COMMA, or end of string, end as valid. 
40. If the next character is not PLUS. MINUS, HYPHEN-MINUS, or LATIN CAPITAL LETTER Z, abort as invalid.
41. Set <timezone> to valid, set <zoneminute> to 0.
42. If the next character is not LATIN CAPITAL LETTER Z, skip to step 45.
43. Set <zonehour> to 0; advance 1 character.
44. If the next character is whitespace, COMMA, or end of string, end as valid. else abort as invalid.
45. If the next character is not PLUS, set <zonehour> to -1, else set <zonehour> to 1.
46. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
47. Multiply <zonehour> by the value given by the next two characters; advance 2 characters.
48. If the next character is whitespace, COMMA, or end of string, skip to step #. 
49. If the next character is not COLON abort as invalid.
50. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, abort as invalid.
51. Set <zoneminute> to the value given by the next two characters; advance 2 characters.

Note that steps 1 to 4 in the above are only needed if years in expanded format (?YYYYY) are used.  With the exception of centuries, ordinal dates, week dates, decimal fractions of the hour, and decimal fractions of the minutes, the above parses the full set of extended format time instants specified by ISO 8601:2004.  Rejecting additional forms, such as YYYY-MM, is for the most part simply changing an "end as valid" to an "abort as invalid" at the appropriate step.

The above does not recognize datetimes with the extra whitespace allowed in the HTML 5 draft as that is not valid in ISO 8601:2004.

[1] http://www.w3.org/TR/NOTE-datetime

Received on Friday, 25 April 2008 17:08:44 UTC