Re: Turtle Bad IRI syntax tests

* Gregg Kellogg <gregg@greggkellogg.net> [2012-11-03 19:33-0400]
> The following tests in the Turtle Syntax Tests look for a parser error, but I think they're actually correct syntax:
> 
> syn-bad-uri-02 [1]
> # Bad IRI : bad escape
> <http://example/\u0020> <http://example/p> <http://example/o> .
> 
> syn-bad-uri-05 [2]
> # Bad IRI : hex 3C is <
> <http://example/\u003C> <http://example/p> <http://example/o> .
> 
> syn-bad-uri-06 [3]
> # Bad IRI : hex 3E is >
> <http://example/\u003E> <http://example/p> <http://example/o> .
> 
> The Turtle Grammar allows any unicode escape to be part of the IRI, and is not restrictive of escapes that match what would be illegal if they are unescaped.

+1
SPARQL used to substitute \us before parsing, which meant that \u003C would look like "<http://example/<>" to the parser. I suspect the Jena Turtle parser did the same.


> [19]	IRIREF	::=	'<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>'
> [27]	UCHAR	::=	'\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX
> 
> I think these should be good syntax tests. If that is the case, my processor now passes all of the RIOT Turtle and TurtleSubm tests except the following:
> 
> test-19.ttl [4] includes illegal characters in IRIs: ", {, |, and }
> 
> tests 14-16 either take too long to run to be useful, or are just too stressful of my implementation. I would be happy if they were excluded.
> 
> Gregg Kellogg
> gregg@greggkellogg.net
> 
> [1] http://svn.apache.org/repos/asf/jena/Experimental/riot-reader/testing/RIOT/Lang/Turtle/syn-bad-uri-02.ttl
> [2] http://svn.apache.org/repos/asf/jena/Experimental/riot-reader/testing/RIOT/Lang/Turtle/syn-bad-uri-05.ttl
> [3] http://svn.apache.org/repos/asf/jena/Experimental/riot-reader/testing/RIOT/Lang/Turtle/syn-bad-uri-06.ttl
> [4] http://svn.apache.org/repos/asf/jena/Experimental/riot-reader/testing/RIOT/Lang/TurtleSubm/test-29.ttl
> 
> 

-- 
-ericP

Received on Sunday, 4 November 2012 11:46:43 UTC