[Bug 3659] Bugs in date/time regexes

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3659

           Summary: Bugs in date/time regexes
           Product: XML Schema
           Version: 1.1 only
          Platform: Macintosh
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Datatypes: XSD Part 2
        AssignedTo: cmsmcq@w3.org
        ReportedBy: cmsmcq@w3.org
         QAContact: www-xml-schema-comments@w3.org


In email to the public comments list, Laurens Holst (lholst@students.cs.uu.nl)
writes as follows.  I am copying this to the Bugzilla system for better
tracking.

The regular expressions for dates and times in the XML Schema 1.1 Datatypes
working draft are not correct, they do not match the grammar. Below you can
find fixed regular expressions.

Basically, I made seven modifications to the originally provided regular
expressions, to make the date/time-regular expressions match the grammar:

1. Fix parenthesis; --(0[1-9])|(1[0-2])- means that it will match e.g. --01 or
12-. Instead, it should be --(0[1-9]|1[0-2])-. Also, the time match had a lot
of needless parenthesis.
2. Use (0[1-9]|[12]\d|3[01]) for days everywhere instead of
([0-2][0-9])|(3[01]). The latter would allow 00.
3. Except for ‘time’, all were missing the ‘Z’ in the time zone
4. Decimal did not accept values with a positive sign
5. Replaced [0-9] with \d (just like ‘digit’ is used in the grammar, and it’s
shorter)
6. Removed the \s before the - where not needed.
7. Added \s before all the + where needed (the browser complains if + is used
unescaped)
8. float has a nit where I changed (-|\+) into (\+|-) to match both the
production and the other regular expressions.

Here are the new regular expressions:

decimal: (\+|-)?((\d+(.\d*)?)|(.\d+))
float: (\+|-)?((\d+(.\d*)?)|(.\d+))((e|E)(\+|-)?\d+)?|-?INF|NaN
dateTime:
-?([1-9]\d\d\d+|0\d\d\d)-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])T(([01]\d|2[0-3]):[0-5]\d:[0-5]\d(\.\d+)?|24:00:00(\.0+)?)(Z|(\+|-)(0\d|1[0-4]):[0-5]\d)?
time:
(([01]\d|2[0-3]):[0-5]\d:[0-5]\d(\.\d+)?|24:00:00(\.0+)?)(Z|(\+|-)(0\d|1[0-4]):[0-5]\d)?
date:
-?([1-9]\d\d\d+|0\d\d\d)-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])(Z|(\+|-)(0\d|1[0-4]):[0-5]\d)?
gYearMonth:
-?([1-9]\d\d\d+|0\d\d\d)-(0[1-9]|1[0-2])(Z|(\+|-)(0\d|1[0-4]):[0-5]\d)?
gYear: -?([1-9]\d\d\d+|0\d\d\d)(Z|(\+|-)(0\d|1[0-4]):[0-5]\d)?
gMonthDay:
--(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])(Z|(\+|-)(0\d|1[0-4]):[0-5]\d)?
gDay: ---(0[1-9]|[12]\d|3[01])(Z|(\+|-)(0\d|1[0-4]):[0-5]\d)?
gMonth: --(0[1-9]|1[0-2])(Z|(\+|-)(0\d|1[0-4]):[0-5]\d)?

Also, I think I found an error in the grammar; in section 3.3.5.2 it says:

The ·lexical space· of float is the set of all decimal numerals with or without
a decimal point, numerals in scientific (exponential) notation, and the
·literals· 'INF', '-INF', and 'NaN'

However, the grammar doesn’t contain ‘INF’, ‘-INF’, and ‘NaN’:

floatRep ::= noDecimalPtNumeral | decimalPtNumeral | scientificNotationNumeral
| minimalNumericalSpecialRep

That should be:

floatRep ::= noDecimalPtNumeral | decimalPtNumeral | scientificNotationNumeral
| minimalNumericalSpecialRep | 'INF' | '-INF' | 'NaN'

The same applies to ‘double’.

Finally, I created a regular expression for base64Binary:

((([A-Za-z0-9+/] ?){4})*(([A-Za-z0-9+/] ?){3}[A-Za-z0-9+/]|([A-Za-z0-9+/]
?){2}[AEIMQUYcgkosw048] ?=|[A-Za-z0-9+/] ?[AQgw] ?= ?=))?

(note: spaces are significant)


~Grauw

-- 
Ushiko-san! Kimi wa doushite, Ushiko-san nan da!!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Laurens Holst, student, university of Utrecht, the Netherlands.

Received on Wednesday, 6 September 2006 02:15:49 UTC