HTML5: comments on section 2.5 [common microsyntaxes]

1. Section 2.5.1. Where it says:

--
When a user agent is to strip line breaks from a string, the user agent must remove any U+000A LINE FEED (LF) and U+000D CARRIAGE RETURN (CR) characters from that string.
--

Should other line-terminators also be included here?

2. Section 2.5.3. I realize that the set of "enumerated values" in HTML5 consists strictly of ASCII strings. However, there is no syntactical requirement that this be true. Non-ASCII values could be included into an enumerated value set, in which case, the "ACSII case-insensitive" match may be incomplete. Instead of using "ASCII case-insensitive", could Unicode case-folded matching be defined instead? This is stable, well-defined, and a strict superset of the ASCII case?

3. Section 2.5.5. The section describes date values using strictly the Gregorian calendar. As long as the values are strictly internal (as a means of representing incremental or floating time values, cf. our note Working With Time Zones), this doesn't represent a barrier to the use of other calendric systems. Please include a note indicating this so that international users understand why Gregorian is used here.

4. Section 2.5.5.1. The 'months' definition only handles dates from the (proleptic) Gregorian year 0 in the common era going forwards. There is no way to represent BCE dates. Should there be?

5. Section 2.5.5.2. The 'date' definition contains no time zone information. It is thus a floating date value and a health warning should be included about converting it to/from incremental time values.

6. Section 2.5.5.3. In parsing a 'time' value, the rule for the hours section is: "Two digits, representing hour, in the range 0 ≤ hour ≤ 23". Should this optionally allow only one digit?

7. Section 2.5.5.3. There is a note disallowing the representation of leap seconds. This poses a potential problem for applications sensitive to leap seconds. Should the values be allowed, even if later processing does not deal with it?

8. Section 2.5.5.4. This section defines 'local dates and times', which is a date-and-time value *without* a time zone. This type seems problematic because it does not deal with the time zone problem. Its relationship to either floating or incremental times is completely arbitrary. See comment #10 following.

9. Section 2.5.5.4. (Editorial). The sentence reading:

--
A local date and time consists of a specific proleptic Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, but expressed without a time zone.
--
... has an obviously incorrect repetition of the phrase "consisting of"

10. Section 2.5.5.5. This defines a 'global date and time', which is the same as a 'local date and time', only with a time zone representation. I propose changing the 'local' value to use the term 'floating' (since it is not truly local). The term 'global' could remain. I would also suggest adding a reference to w3.org/TR/timezone (our note) to help users understand when to use which type.

11. Section 2.5.5.5. The note on time zone offsets says in part: "...and the minutes component of offsets of actual time zones is always either 00, 30, or 45."  Really this is arbitrary and prone to change. The total range of time zones also changes from time to time. I would suggest inserting a "at the time this document was published" caveat such that innocent readers are not caught by surprise by a new 15 mintue time zone offset or by more monkey business surrounding the IDL.

12. Section 2.5.5.5. This section gives a number of examples that equate time zone offset with an actual time zone. For example:

--
"1979-10-14T12:00:00.001-04:00"
    One millisecond after noon on October 14th 1979, in the time zone in use on the east coast of the USA during daylight saving time.
--

It should be made clear that a zone offset is not the same thing as a time zone. Mention should be made of the need for separate time zone information when working with real date and time values in use cases that depend on it (see our note on ......)

13. Section 2.5.5.5. Rule 9 in the parsing of a valid global date and time is: "Let time be the moment in time at year year, month month, day day, hours hour, minute minute, second second, subtracting timezonehours hours and timezoneminutes minutes. That moment in time is a moment in the UTC time zone.". It is not clear what type 'time' is. It appears to be a field-based time value, but it this intended as an implicit conversion to incremental time?

14. Section 2.5.5.6. This section defines 'weeks'. The rules for weeks and week counting are culturally linked, but these rules define week start as always Monday. The rule for determining the "first week" that it includes the first Thursday (again, a culturally variant value). Shouldn't there be provision for allowing culturally specific week rules be applied?



Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Thursday, 14 July 2011 19:05:48 UTC