[whatwg] Issues relating to the syntax of dates and times

On Sat, 18 Nov 2006, Henri Sivonen wrote:
>
> Why does WA 1.0 require the year to be exactly 4 digits long when in WF 
> 2.0 it is four or more digits?

Fixed.


> Why doesn't WA 1.0 make 1 AD the first year thus dodging the year zero 
> issue like WF 2.0?

Fixed.


> Have I understood correctly, that
>
> * WF 2.0 date formats never allow surrounding white space for document 
> conformance and must be rejected by UAs if they do

Correct.


> * WA 1.0 Specific moments in time never allow surrounding white space 
> for document conformance but UAs must gracefully ignore surrounding 
> white space and trailing garbage

As of the recent edits, the WF2 form of not skipping whitespace has been 
adopted.


> * WA 1.0 Vaguer moments in time always allow surrounding white space?

I revamped the syntax for these. It is now consistent with the others -- 
no spaces in attributes, White_Space allowed around in content.


> Why do WA 1.0 datetime formats for attributes allow space around "T" or 
> multiple spaces in place of "T" when WF 2.0 only allows "T"? Also, why 
> are spaces allowed before the time zone designator in the attribute 
> variants in WA 1.0 when WF 2.0 does not allow spaces before "Z"?

Fixed.


> Also, the "in content" variant of the Vaguer moments in time algorithm 
> is not stable over time, because Unicode can add more Zs characters.

Correct.


On Mon, 20 Nov 2006, Henri Sivonen wrote:
> 
> Also, for consistency with WF 2.0 and HTML 4.01, I suggest that vaguer 
> moments in time *in attributes* not allow spaces in conforming strings 
> and require T as the date/time separator.

Done.


On Mon, 20 Nov 2006, Henri Sivonen wrote:
>
> Currently, the definition of Vaguer moments in time allows seconds that 
> have only one digit before the decimal point and no digits after the 
> decimal point. This doesn't make sense considering that hours and 
> minutes must have two digits.

They use the same as all other times now.


> The motivation for these formats is consistency with HTML 4.01 and Web 
> Forms 2.0 where departure from the formats required by those specs is 
> not necessary. These formats allow leading zeros in the year. However, I 
> think it would be reasonable to ban leading zeros in years that have 5 
> or more digits if WF 2.0 also bans those.

Not banned in HTML5.


On Sun, 27 Apr 2008, Ernest Cline wrote:
>
> In section 3.10.10, the second example is:
>  <time datetime="2006-09-24 05:00 -7">
> 
> However, the algorithm given in 3.2.4.2 for parsing date or time strings 
> requires that the timezone hour offset be exactly 2 digits.  (This is 
> the same requirement ISO 8601 has.)  Hence, the example as given is 
> invalid according to the provided parser algorithm, since it has only 1 
> digit.

Fixed.


On Sun, 27 Apr 2008, Ernest Cline wrote:
> 
> At present the HTML 5 draft version extends the allowed syntax of the 
> datetime value from that used in HTML 4 / XHTML 1 to include strings 
> that are not valid ISO 8601 specifiers.  Specifically, it breaks syntax 
> allowed by ISO 8601 by allowing optional whitespace.  I can to some 
> degree see the value of allowing whitespace where the textContent of the 
> <time> element is being parsed instead of the datetime attribute.  
> However, for the attribute itself, unless an existing implementation is 
> accepting extra whitespace when parsing the datetime attribute of <ins> 
> and <del> in HTML 4, I can't see the value in accepting whitespace for 
> the attributes.

I've changed back to not allowing whitespace.


On Wed, 30 Jul 2008, Benjamin Hawkes-Lewis wrote:
>
> Regardless of what elements are added to HTML5, I believe HTML5 needs a 
> simple extension point where microformats can insert machine-parsable 
> equivalents and expansions of human friendly data. Data types are by no 
> means limited to those already covered by the HTML5 proposals:
> 
> http://microformats.org/wiki/machine-data
> 
> Such an extension point could meet the use-case of making datetimes BC 
> extractable and also any use-case for far-future datetimes without 
> requiring HTML5 to explicit specify calendar APIs for them.

The suggestion on that page:

   We're meeting up on Northumberland Avenue (<span 
   class="geo">51.507033,-0.126343</span>).

   <span class="duration">3 minutes and 23 seconds <span 
   class="value">PT3M23S</span></span>

...seems fine to me.


On Wed, 30 Jul 2008, WeBMartians wrote:
> 
> At the very least, ensure that the range of times (dates, durations, 
> intervals and times-of-day) and the granularity are well and rigorously 
> specified. Ensure, also, that there is a Javascript mechanism to alarm, 
> mechanically, when such element values exceed the specified envelope (I 
> do not see such in the current text).

Is the current text satisfactory?


> If the browser cannot handle a datetime string of "-0548-11-22 
> 18:23:46,03276548901+3" (other-epochal, proleptic, locale-dependent and, 
> I'm certain, annoying in several other respects), just make sure 
> Javascript does something predictable and explicit.

The spec is pretty explicit now about what should happen.


> I would claim that an epoch of 1970 (the traditional, UNIX epoch) is 
> ludicrous just because so many luminaries started their existence before 
> that moment (for example, "me" - ahem). On the other hand, I could 
> understand a requirement that an epoch of no later than 1900, while 
> limited, is at least "proper" (even in light of some locales' not 
> adopting the Gregorian calendar until the 1930s).

The spec allows any year from 0001 up (Gregorian, though).


On Thu, 31 Jul 2008, WeBMartians wrote:
> 
> Consider the couple to be congratulated on their gazillionth 
> anniversary. Is that diamond, gold, platinum? Whatever it is, if your 
> date time system is limited to epoch 1970, you're out of luck. That's 
> why I claim that restrictions (rigorously documented) are OK as long as 
> they are not ludicrous - "ludicrous" being a gray area, rather than a 
> sharp line - 1970 definitely is, 1900 is probably OK, 1582 is 
> interesting and far less ludicrous, while -9999 is very safe but maybe 
> ludicrous in other ways (prolepsis, locales...).

The spec allows dates from year 0001, so that seems addressed now.


On Wed, 30 Jul 2008, Benjamin Hawkes-Lewis wrote:
> Ian Hickson wrote:
> > On Thu, 24 Apr 2008, Henri Sivonen wrote:
> > > How do proleptic Gregorian dates before the Common Era fit into any 
> > > of the use cases that states are used for in HTML?
> > > 
> > > Insertion and deletion dates are contemporary. Date form widgets are 
> > > meant for airline and hotel reservations and, hence, need to pick 
> > > dates from the near future. The time element is meant for 
> > > microformats, which means that it will be used for encoding current 
> > > or near-future events dates.
>
> Microformats may also be used to mark up events that happened in the 
> past and people who are dead.

What's the point?


> For example:
> 
> http://en.wikipedia.org/wiki/Walt_Disney
> 
> If HTML5 does not provide a way to specify datetimes BC, then the 
> microformats community would be left in the boat they're already in of 
> trying to fudge markup to encode datetimes BC. Little gained, really.

Why would you use Microformats to mark up a date in BC? What problem does 
it solve?

In practice, if you actually need to mark up dates from before the 
Gregorian calendar (let alone BC) you have to do far more complicated 
things, like saying which country you're talking about, which calendar, 
define your sources and your margin of error, etc. Typically you'll be 
talking about seasons and years, not dates.

This kind of data isn't really machine-usable anyway.


On Thu, 31 Jul 2008, Benjamin Hawkes-Lewis wrote:
> 
> Again, you're thinking entirely in terms of social networking and not in 
> terms of education and intellectual curiosity.
> 
> I'd imagine more probable applications would be building (or searching) 
> collections of biographical or event data from multiple sources.
> 
> Let's say you have an application for constructing chronologies, and 
> you're constructing a chronology of (say) the history of animation. You 
> could drag and drop Walt's birthday onto the chronology.
> 
> Look at this lesson plan for example:
> 
> "Have a collection of images of famous people use as a resource show to 
> the children and discuss who they are and why they are famous. Have a 
> selection of people from the past and present. Use www.Google.co.uk to 
> find images. You could see if they could try and put them in a 
> timeline."
> 
> http://www.supporting-ict.co.uk/weblinks/historyks1.htm
> 
> (If you look around, you'll find plenty of timeline-oriented approaches 
> to the past.)

What has this to do with HTML5's <time> element, though?


> Or, maybe you're building a database of animators with film samples. You 
> could pull microformatted biographical information out from across the 
> web and add it to your page.
> 
> Or, maybe you're a journalist who needs to construct an "on this day" 
> article. You search for stuff that happened on Disney's birthday, and 
> come across Disney's biography that way.
> 
> Anyhow, whether such applications of microformats fits how you imagine 
> or would like to dictate how people use microformatted data, TIME as 
> defined cannot cover how microformats are already applied, so let us not 
> pretend that it does. You're free to argue that trying to encode such 
> information is pointless, but that's an argument you'd want to take up 
> with the microformats community and one I cannot support.

While I could see that maybe one day there'd be a use case for <time> that 
would need historical dates, I really think that we'd have to tackle other 
calendars in use today before looking at calendars that aren't in use 
anymore. So I'd rather punt this for now.


On Sat, 25 Oct 2008, Gerard Ashton wrote:
>
> The part of the spec at 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.h 
> tml#date-or-time-string in the "2.4.4.2 Vaguer moments in time" section 
> contains a typographical error. In this phrase:
> 
> "If second is not a number in the range 0 ? minute < 60, then the string 
> is invalid, abort these steps. "
> 
> the word "minute" should be replaced by "second".

That text is gone now.


> Also, it is quite confusing to have times constantly being refered to as 
> UTC times, and yet find the following phrase in "2.4.4.1 Specific 
> moments in time":
> 
> "If second is not a number in the range 0 ? second < 60, then fail. (The 
> value 60 and 61 are not allowed: leap seconds cannot be represented by 
> datetime values.) "
> 
> Where the time element is described, at 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-time-element 
> the description of the second used is unclear. I infer from the 
> description that the value of the time element is the number of UT 
> milliseconds since 1970-01-01 00:00 UTC. The distinction being that UTC 
> only applies to the initial epoch, 1970-01-01 00:00 UTC. After the 
> epoch, time is counted in universal time (UT), with no leap seconds 
> allowed (that is, a seconds value of 60.500 violates the spec).
> 
> Am I on the right track? Should the spec spell this out, or provide a 
> hyperlink to a location where it is spelled out?

The idea is that it is the UTC time, it's just that there are a few 
seconds of history here and there that can't be represented.

"UT" isn't a real time system as far as I can tell. Did you mean UT1? Or 
TIA? Both of these are slightly offset from UTC and so the same time would 
mean something different than most people would assume.

UTC is what is used by most people (knowingly or not), so it seems best to 
use that here too.


On Sun, 26 Oct 2008, Gerard Ashton wrote:
> 
> The three attributes of type DOMTimeStamp (the date, time, and timezone 
> attributes) seem the most troublesome, for the following reasons.
> 
> Since DOMTimeStamp is an unsigned long integer

For JS, DOMTimeStamp is bound to the "Date" object, so the problem you 
mention (being limited to time after the Unix epoch) is not an issue.


> Furthermore, the only words in the specification that would imply a 
> particular length for the second (and therefore the millisecond) are UTC 
> or Coordinated Universal Time. Since 1972 UTC has counted in the second 
> of the International System of Units, that is, the second of atomic 
> clocks. The actual mean solar second is slightly longer. From the time 
> of the proposed HTML 5 epoch until January 1, 2009, atomic time and UTC 
> diverge by 29.7683 seconds. This accounts for the leap second that 
> occurs at the end of 2008. The next leap second after that will cause 
> the difference between HTML 5 time and UTC to be more than 30 seconds, 
> which means that if rounded to the nearest minute, the minute value will 
> differ by 1.

No, HTML5 time _is_ UTC time, by definition. It just can't represent a few 
seconds here and there, because the syntax doesn't allow representing the 
leap seconds.


> The choice of epoch matches the epoch for Unix, but otherwise seems 
> inconvenient. The problem of the date attribute failing for some 
> information about living people would suggest choosing an epoch before 
> an living person was born, such as the adoption of the Gregorian 
> Calendar (1582-10-15).

Actually the Gregorian calendar started in different years in different 
countries, so picking that date would be even more problematic. :-)


> If a date near 1970 really is seen as a desirable date, 1973-01-01 00:00 
> UTC suggests itself, because the use of leap seconds began 1973-01-01 
> 00:00 UTC, so any algorithm that needed to account for the difference 
> between atomic time and UTC would only have to deal with integer 
> differences (even if the algorithm needs to work with values slightly 
> before the epoch).

I don't really see why any algorithm would have to worry about the 
difference between UTC time and TIA time in an HTML page.


> Considering the inability to represent leap seconds in the time element, 
> it appears to me the only way to write a specification that does not 
> contradict itself is to say that the epoch is 1973-01-01 00:00 UT1, not 
> UTC, and that the times represented are UT1 times, not UTC.

If we did that, then if someone gave an exact time, they would in fact be 
giving a time a few seconds away from what they thought they were giving. 
This is quite a problem, IMHO.


> Thus the unit of measure for the attributes would be seconds of mean 
> solar time as measured by UT1, not seconds as kept by atomic clocks. 
> Since UT1 does not observe leap seconds, the limitation of the time 
> element will not cause outright errors and contradictions. The absolute 
> value of the difference between UTC and UT1, which is always less than 
> 0.9 second, seems unlikely to cause trouble for the types of 
> applications envisioned for this element.

UT1 doesn't use SI seconds (it is literally affected by the rotation of 
the moon and other celestial effects, which would mean that its use could 
literally lead to bugs that depend on the phase of the moon), which makes 
its use in computers rather problematic.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 25 November 2008 03:22:31 UTC