Turtle Bad IRI syntax tests from Gregg Kellogg on 2012-11-03 (public-rdf-wg@w3.org from November 2012)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Sat, 3 Nov 2012 19:33:56 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
CC: RDF WG <public-rdf-wg@w3.org>
Message-ID: <70332670-9C8D-4A20-B33C-222715DC5245@greggkellogg.net>

The following tests in the Turtle Syntax Tests look for a parser error, but I think they're actually correct syntax:

syn-bad-uri-02 [1]
# Bad IRI : bad escape
<http://example/\u0020> <http://example/p> <http://example/o> .

syn-bad-uri-05 [2]
# Bad IRI : hex 3C is <
<http://example/\u003C> <http://example/p> <http://example/o> .

syn-bad-uri-06 [3]
# Bad IRI : hex 3E is >
<http://example/\u003E> <http://example/p> <http://example/o> .

The Turtle Grammar allows any unicode escape to be part of the IRI, and is not restrictive of escapes that match what would be illegal if they are unescaped.

[19]	IRIREF	::=	'<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>'
[27]	UCHAR	::=	'\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX

I think these should be good syntax tests. If that is the case, my processor now passes all of the RIOT Turtle and TurtleSubm tests except the following:

test-19.ttl [4] includes illegal characters in IRIs: ", {, |, and }

tests 14-16 either take too long to run to be useful, or are just too stressful of my implementation. I would be happy if they were excluded.

Gregg Kellogg
gregg@greggkellogg.net

[1] http://svn.apache.org/repos/asf/jena/Experimental/riot-reader/testing/RIOT/Lang/Turtle/syn-bad-uri-02.ttl
[2] http://svn.apache.org/repos/asf/jena/Experimental/riot-reader/testing/RIOT/Lang/Turtle/syn-bad-uri-05.ttl
[3] http://svn.apache.org/repos/asf/jena/Experimental/riot-reader/testing/RIOT/Lang/Turtle/syn-bad-uri-06.ttl
[4] http://svn.apache.org/repos/asf/jena/Experimental/riot-reader/testing/RIOT/Lang/TurtleSubm/test-29.ttl

Received on Saturday, 3 November 2012 23:34:39 UTC