Re: TriG test suite issues from Gregg Kellogg on 2013-09-24 (public-rdf-comments@w3.org from September 2013)

From: Gregg Kellogg <gregg@greggkellogg.com>
Date: Tue, 24 Sep 2013 13:07:01 -0700
To: Gregory Williams <greg@evilfunhouse.com>
Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-Id: <B58811E0-35F7-47F6-9425-301CF79C799E@greggkellogg.com>

Thanks for your comments. The TriG tests were forked before the updates were made to the Turtle tests. I made the updates you requested and a couple of more.

Please let us know if this resolves your issue.

Gregg Kellogg
gregg@greggkellogg.net

On Sep 20, 2013, at 1:41 PM, Gregory Williams <greg@evilfunhouse.com> wrote:

> In working with the latest TriG test suite, I ran across two issues which I hope can be addressed.
> 
> 1)
> 
> Several of the N-Quad files used in the test suite use new-style N-Triples \u escapes (with lowercase hex chars). I believe this was fixed in the Turtle test suite to allow existing (old-style) N-Triples parsers to be used to test implementations of new-Turtle systems, and I think the same reasoning should apply to N-Quads and TriG. The files with the new-style escapes are:
> 
> localName_with_assigned_nfc_bmp_PN_CHARS_BASE_character_boundaries.nq localName_with_assigned_nfc_PN_CHARS_BASE_character_boundaries.nq localName_with_nfc_PN_CHARS_BASE_character_boundaries.nq
> 
> Can they please be changed to use all-caps hex characters in escapes?
> 
> 
> 2)
> 
> Two (utf8 encoded) TriG files in the test suite contain the U+EFFFF codepoint. While the TriG (and Turtle) grammars allow this codepoint in their character ranges, this codepoint is not a valid Unicode character. This causes problems for me in testing my TriG code because I can't easily change the behavior or perl or the low-level libraries being used to handle Unicode and file I/O (which I believe are doing the correct thing in throwing errors when they see this codepoint). The relevant Unicode code table[1] says of this codepoint range:
> 
> "These codes are intended for process-internal uses, but are not permitted for interchange."
> 
> I see that this issue has been discussed on the mailing list with respect to the range being used in the grammar, but given this Unicode text, I can't see how this codepoint can reasonably be used in a test suite and expected not to cause problems. The two files I see containing this codepoint are:
> 
> prefix_with_PN_CHARS_BASE_character_boundaries.trig
> labeled_blank_node_with_PN_CHARS_BASE_character_boundaries.trig
> 
> thanks,
> .greg
> 
> 
> [1] http://www.unicode.org/charts/PDF/UEFF80.pdf
> 
>

Received on Tuesday, 24 September 2013 20:07:31 UTC