W3C home > Mailing lists > Public > public-rdf-comments@w3.org > July 2012

Turtle parsing

From: Paul Gearon <pgearon@revelytix.com>
Date: Thu, 19 Jul 2012 10:47:06 -0400
Message-ID: <CAOQ8B2FSkiNJDry9E8-7YCaX69MU-BeT+O5iwysCF=SvtWCfkA@mail.gmail.com>
To: public-rdf-comments@w3.org

I have some questions and comments about the Turtle parsing grammar
and current tests. I'm looking at the Working Draft found at:
so please let me know if I have made a mistake with the appropriate document.

- The document makes no statement as to whether numbers literals
should be represented canonically. Given that these can be represented
as a raw number (e.g. 2.4 instead of
"2.4"^^<http://www.w3.org/2001/XMLSchema#decimal>), then I would
expect the canonical form to be appropriate. I suggest that whether or
not canonicalization is required be documented.

- The test case test-28 (decimal data type - serializing test) appears
to support the canonicalization of decimals. However,
"2.3"^^<http://www.w3.org/2001/XMLSchema#decimal> which is in the
canonical form is being expanded to
"2.30"^^<http://www.w3.org/2001/XMLSchema#decimal>, which is not

- The documentation for xsd:decimal requires a minimum of 18 digits.
There is also the option of setting a maximum number of digits (this
must be documented). However, test-28 is making a presumption of only
18 digits. This seems inappropriate, though testing up to the 18 digit
minimum is correct.

- Test case test-30 contains the following IRI:


This contains all of the characters that IRIREF explicitly disallows
(except the > character), thereby leading the test to fail:
  ([^#x00-#x20<>\"{}|^`\] | UCHAR)*

It also appears that UCHAR is allowing a back door for the characters
#x00-#x20. I expect that this cannot be avoided at the level of the
grammar, but perhaps it should be documented.

- Production 160s (NIL) is not used. Is this still needed?

Paul Gearon
Received on Thursday, 19 July 2012 14:47:39 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:59:30 UTC