- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 19 Jul 2012 16:15:41 +0100
- To: public-rdf-comments@w3.org
(personal reply) On 19/07/12 15:47, Paul Gearon wrote: > Hi, > > I have some questions and comments about the Turtle parsing grammar > and current tests. I'm looking at the Working Draft found at: > http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/index.html > so please let me know if I have made a mistake with the appropriate document. > > > - The document makes no statement as to whether numbers literals > should be represented canonically. Given that these can be represented > as a raw number (e.g. 2.4 instead of > "2.4"^^<http://www.w3.org/2001/XMLSchema#decimal>), then I would > expect the canonical form to be appropriate. I suggest that whether or > not canonicalization is required be documented. A parser generates RDF terms, and a literal is a lexical form and a datatype (and maybe a language tag). There is nothing about values and a parser may not be aware of all datatypes. While I think we ought to encourage a value-centric view of the world, and canonicalization is good, sometimes it is necessary to preserve non-canonical forms - so the spec should not force it. > - The test case test-28 (decimal data type - serializing test) appears > to support the canonicalization of decimals. However, > "2.3"^^<http://www.w3.org/2001/XMLSchema#decimal> which is in the > canonical form is being expanded to > "2.30"^^<http://www.w3.org/2001/XMLSchema#decimal>, which is not > canonical. The test are the old Turtle tests and haven't been updated. ((If any one has a comprehensive set of tests for Turtle, I'm sure the WG will be delighted to incorporate it.)) > - The documentation for xsd:decimal requires a minimum of 18 digits. > There is also the option of setting a maximum number of digits (this > must be documented). However, test-28 is making a presumption of only > 18 digits. This seems inappropriate, though testing up to the 18 digit > minimum is correct. Agreed. The test is wrong - if the lexical form is X chars long, then that is what it is. (this has been mentioned before) > - Test case test-30 contains the following IRI: > > <scheme:\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008\t\n\u000B\u000C\r\u000E\u000F\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F > !"#$%&'()*+,-./0123456789:/<=\u003E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007F> > > This contains all of the characters that IRIREF explicitly disallows > (except the > character), thereby leading the test to fail: > ([^#x00-#x20<>\"{}|^`\] | UCHAR)* Agreed. This is legacy though. RIOT fails this test, and test-28. They were reflecting assumptions about the parser setup; they are unpassable now. > It also appears that UCHAR is allowing a back door for the characters > #x00-#x20. I expect that this cannot be avoided at the level of the > grammar, but perhaps it should be documented. Yes and no :-) -- an IRI still had to be an IRI so even if it passes the weak syntax restrictions, all the IRI (inc scheme specific) rules apply, which can't be captured by a regex. And many systems choose not to do full IRI checking. In SPARQL (1.0), there is a simple regex to allow parsing (no spaces!) but it is not intended to guarantee valid IRIs. Andy > > - Production 160s (NIL) is not used. Is this still needed? > > Regards, > Paul Gearon >
Received on Thursday, 19 July 2012 15:16:31 UTC