W3C home > Mailing lists > Public > public-gld-wg@w3.org > January 2012

Re: ISSUE-3 (DTF): Date and Time Format

From: Phil Archer <phila@w3.org>
Date: Tue, 17 Jan 2012 12:43:54 +0000
Message-ID: <4F156D0A.50308@w3.org>
To: Richard Cyganiak <richard@cyganiak.de>
CC: Government Linked Data Working Group WG <public-gld-wg@w3.org>


On 17/01/2012 12:06, Richard Cyganiak wrote:
> Here's my take on this. I'll break it down into sub-questions.
>
>
> Q1: Should the range of dcterms:issued include plain literals for free-text descriptions (“sometime in the 70's or 80's”)?
>
> A: Yes, although the use of such free-text date descriptions should be discouraged.

Agreed.

>
>
> Q2: Should the date format allow placeholders such as “200?” for the previous decade or “2011-00-00” where month and date are unknown?
>
> A: No. This is not allowed in W3CDTF or XML Schema Datatypes or ISO 8601 or SQL or any other date spec I'm aware of. Existing date code such as Java's java.util.Date or PHP's strtotime will in the best case just barf, and in the worst case produce nonsense such as turning 2012-02-00 into 2012-01-31. I'm also not aware of any existing government data catalog that codes dates in this notation, or in any other way that can be automatically transformed into this notation. We should not recommend a notation that requires manual re-coding and is incompatible with everything.

It wouldn't be typed, it's a plain literal so there is no effective 
difference between 198? and "Sometime in the 1980s". Therefore no 
software should barf and no transformation of 2012-02-00 to 2012-02-01 
should take place.

The latter is something that IMO we should work hard to avoid. 
2012-01-31 is a real date. 2012-02-00 is a deliberate approximation and 
by changing it you degrade the data. You could write

ex:dateOfBirth 2012-02^^<http://purl.org/dc/terms/W3CDTF>

and that unambiguously says "February 2012". Changing that to 31 Jan 
2012 is at best writing fiction and at worst changing what might be an 
important distinction between whether an event happened in January and 
February.

It's this forcing of a square peg (xsd:date) into the round hole of 
reality that I don't like about using xsd:date. It's fine if you always 
know the full date but it's bad when dealing with approximations which 
is common in things like registers of people's dates and places of birth.
>
>
> Q3: Should the date format be dcterms:W3CDTF instead of xsd:date in order to support less specific dates such as yyyy and yyyy-mm?
>
> A: No. If at all, then it should allow the W3C-recommended datatypes xsd:gYear and xsd:gYearMonth in addition to xsd:date. But I would prefer not to go there as it makes the creation of clients significantly harder (e.g., correct ordering and filtering of dates). The current approach of filling in 01 for unknown months and dates is a good compromise between simplicity and representational fidelity, IMO.

I agree that DCAT should not use xsd:gYear etc.

>
> Q4: Should the date format be xsd:dateTime instead of xsd:date to support higher precision than day?
>
> A: Most existing government catalogs seem to specify data release dates up to the day, without time component. Displaying and processing dateTimes is quite a bit more involved because now we have to deal with time zones, variable precision and so on, and make the problem of filling in 00 or 01 for unknown values even more prevalent. Again, I think that xsd:date strikes the right balance. (It is true that SPARQL has dedicated support for xsd:dateTime but not for xsd:date, but that's fine – unlike for xsd:dateTime, ordering xsd:dates doesn't require special code, lexical ordering will be correct.)
>

Agree.

>
> Q5: Then how to deal with cases where the year is unknown, or where the time of day really matters, or where the ambiguity between “01” and “unknown” is really unacceptable?
>
> A: Use a plain literal (see Q1), or deviate from the recommendation.

Yes. For which 198? is no worse than a bit of text but has the potential 
to be machine processed.

>
>
> My conclusion is that dcat should stick with xsd:date (and allow but discourage plain literals) because it's the simplest approach that fulfils the use cases.

Given that xsd:date is a sub type of W3CDTF this seems like an 
unnecessary restriction to me but, as ever, I am ready to be shouted 
down by the prevailing opinion.

Phil.

>
>
> On 6 Jan 2012, at 15:19, Government Linked Data Working Group Issue Tracker wrote:
>
>>
>> ISSUE-3 (DTF): Date and Time Format
>>
>> http://www.w3.org/2011/gld/track/issues/3
>>
>> Raised by: Phil Archer
>> On product:
>>
>> The current version of DCAT seems a little confused wrt date and time formats. We use dcterms:issued and repeat the DC range declaration of rdfs:Literal and then say it should be datatyped as xsd:date. So far so good. But then the text refers to the W3CDTF document. And they're not the same.
>>
>> xsd:date requires that values be present for yyyy-mm-dd
>>
>> W3CDTF is more flexible and allows any of:
>> yyyy
>> yyyy-mm
>> yyyy-mm-dd (and then times can be added)
>>
>> The DCAT spec says that if a day and/or month are not known then one should use the value 01. This assumes:
>>
>> - that the year is always known;
>> - that a date like 2012-01-06 is ambiguous since it includes '01'.
>>
>> There may be cases in which the year is not known. For example, 'the 1980s' might be written as 198?. That breaks W3CDTF but it's an approximation. As it happens this came up just yesterday in the EU work that Christophe and I are doing so it's fresh in my mind. Taking all that on board, my proposal is therefore that:
>>
>> 1. Rather than specify a datatype of xsd:date we specify W3CDTF (which is what DC recommends). We can use the URI http://purl.org/dc/terms/W3CDTF to give the data type.
>>
>> 2. We recommend using '00' not '01' for unknown dates.
>>
>> 3. We explain that just giving the year or the year and month is valid.
>>
>> 4. Where the year is uncertain, use the ? character to express this but recognise that this breaks the model and is not W3CDTF. Therefore the data should not be so typed.
>>
>> 5. Where even strings like 198? cannot be provided, plain text such as "sometime in the 1970s or '80s" may be used but this should be avoided if at all possible.
>>
>> Given DCAT's use cases the latter seems unlikely (it happens in public sector records for things like dates of birth) so maybe we could drop that bit, but 1 - 4 seem valid?
>>
>>
>>
>
>
>

-- 


Phil Archer
W3C eGovernment
http://www.w3.org/egov/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Tuesday, 17 January 2012 12:44:22 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 25 June 2013 15:04:56 UTC