- From: Henry Story <henry.story@bblfish.net>
- Date: Fri, 2 Mar 2012 17:50:49 +0100
- To: Alex Hall <alexhall@revelytix.com>
- Cc: public-rdf-comments@w3.org
- Message-Id: <BF5EECB0-EFE6-46F5-9EA0-9713E5CBEA52@bblfish.net>
On 2 Mar 2012, at 15:23, Alex Hall wrote: > On Fri, Mar 2, 2012 at 2:19 AM, Henry Story <henry.story@bblfish.net> wrote: > pretty much the only positive test that fails for me at present consistently across Jena, Sesame and my > implementation is Test-29.ttl [1] which contains the following statement > > <http://example.org/node> <http://example.org/prop> <scheme:\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008\t\n\u000B\u000C\r\u000E\u000F\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F !"#$%&'()*+,-./0123456789:/<=\u003E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007F> . > > This is causing the apache abdera IRI [2] library to barf . It looks like they put a lot of energy into this library, and so that's made me wonder where the error lies. This can be reproduced like this on the scala console > > scala> import org.apache.abdera.i18n.iri._ > scala> val iriStr = "scheme:\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008\t\n\u000B\u000C\r\u000E\u000F\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019" > [line elided for control chars: possibly a scala signature] > scala> val iriStr2 = "\u001A\u001B\u001C\u001D\u001E\u001F !\"#$%&'()*+,-./0123456789:/<=\u003E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007F" > [line elided for control chars: possibly a scala signature] > scala> val iri = iriStr + iriStr2 > scala> val i = new IRI(iri) > org.apache.abdera.i18n.iri.IRISyntaxException: org.apache.abdera.i18n.text.InvalidCharacterException: Invalid Character 0x1(?) > at org.apache.abdera.i18n.iri.IRI.parse(IRI.java:577) > at org.apache.abdera.i18n.iri.IRI.<init>(IRI.java:64) > ... > > > I looked at http://tools.ietf.org/html/rfc3987 to see what the spec said there, but I don't think those characters are > allowed. Can I remove this from the examples? What should I replace it with that would test the spec? Should we move this > one to a bad-test? > > This is another situation of syntactically valid Turtle that is not valid RDF. The IRI in question has Unicode-escaped control characters. All Unicode escape sequences are allowed in Turtle, but when the sequence is unescaped as part of the parsing process it becomes a syntactically invalid IRI (and therefore not valid for RDF). Any IRI parser will choke on this particular IRI. > > I think it's a perfectly reasonable thing to do to incorporate an IRI parser into a Turtle parser, for validation as well as resolving relative IRIs against @base. For this reason, I think it's good practice to keep the positive parser tests in the realm of valid RDF, not just syntactically valid Turtle. I fixed the examples for my test suite, by following the lead of Jena's fix on this https://github.com/betehess/pimp-my-rdf/commit/460d16ac3829dbd963e500c1367b5e45edf3428c This was discussed there too https://issues.apache.org/jira/browse/JENA-216 Henry > > Regards, > Alex > > > > Henry > > > [1] http://www.w3.org/TR/turtle/tests/test-29.ttl > [2] http://grepcode.com/file/repo1.maven.org/maven2/org.apache.abdera/abdera-i18n/1.1.2/org/apache/abdera/i18n/iri/IRI.java > http://abdera.apache.org/ > > > > Social Web Architect > http://bblfish.net/ > > > Social Web Architect http://bblfish.net/
Received on Friday, 2 March 2012 16:51:26 UTC