Re: PROPOSED to RESOLVE ISSUE-127 with Canonical N-Triples from Gavin Carothers on 2013-10-18 (public-rdf-comments@w3.org from October 2013)

From: Gavin Carothers <gavin@carothers.name>
Date: Fri, 18 Oct 2013 07:50:09 -0700
To: Gregory Williams <greg@evilfunhouse.com>
Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <CAPqY83z0DjkBUgw06=9Qz0M+Au3=tHQtaeW_kt4dshjB_uO6ag@mail.gmail.com>
On Fri, Jul 5, 2013 at 2:34 PM, Gregory Williams <greg@evilfunhouse.com>wrote:

>
> On Jul 5, 2013, at 9:26 PM, Gavin Carothers <gavin@carothers.name> wrote:
>
> > https://www.w3.org/2011/rdf-wg/track/issues/127 states that the new
> N-Triples specification doesn't provide for the old functionality of a
> given triple having one and only one way to write it down. The current
> draft of N-Triples has added a Canonical N-Triples definition to the
> conformance section.
> >
> > A canonical N-Triple document is a N-Triple document with additional
> constraints:
> >
> >
> >       • Space between terms (WS+) SHOULD be a single space, (U+0020).
> >       • Space after or before terms (WS*) SHOULD be empty.
> >       • HEX SHOULD use only uppercase letters ([A-F]).
> >       • Characters not allowed directly in STRING_LITERAL_QUOTE (U+0022,
> U+005C, U+000A, U+000D) SHOULD use ECHAR not UCHAR.
> >       • Characters SHOULD be represented directly and not by UCHAR.
> > This is NOT the same as the current definition in RDF Test Cases as it
> prefers the direct representation of characters over the use of escape
> sequences. It also specifies the white space rules.
>
> I am not satisfied with this as a resolution to my issue. Having this
> "canonical N-Triples" variant does nothing to address my comment that the
> new draft has made significant changes to N-Triples, and many of these
> introduce multiple ways to encode a given N-Triples graph. As N-Triples is
> an established format in widespread use, I consider these changes
> ill-advised and see no actual value in changing the format to support them.
>
> I would like to see the N-Triples grammar reverted to its previous form
> where there were essentially no choices left to implementations in how to
> serialize a graph. If anything, I would think a "canonical N-Triples"
> constraint on the original (RDF Test Cases) grammar that tightened the
> allowable use of whitespace would be better.
>

Gregory,

Thank you again for your comments on N-Triples.

This is the second formal response to issue
http://www.w3.org/2011/rdf-wg/track/issues/127

N-Triples was originally created as part of the RDF Test Cases. As such it
included:

N-Triples is an RDF syntax for expressing RDF test cases and defining the
correspondence between RDF/XML and the RDF abstract syntax. RDF/XML
[RDF-SYNTAX] is the recommended syntax for applications to exchange RDF
information.

It also did not have a distinct media type, and was recommended only for
test cases. As such it did not have any internationalization requirements
placed on it. Also the world has changed since 2001 when it was decided
that N-Triples should be ASCII and not UTF-8.

RDF Test Cases N-Triples requires the following:

<http://example.org/> <http://example.org/property>
"I\u00F1t\u00EBrn\u00E2ti\u00F4n\u00E0liz\u00E6ti\u00F8n" .

N-Triples REC track allows and recommends:

<http://example.org/> <http://example.org/property> "Iñtërnâtiônàlizætiøn".

While the first was totally acceptable for a test case format, it is not
acceptable for use as a wide spread data exchange format. In order to
address internationalization concerns and adopt the practice of existing
implementations in the wild N-Triples is now allowed and recommended to be
UTF-8 while continuing to support data using \u \U escapes. In modern
systems "Iñtërnâtiônàlizætiøn" is greatly preferred by users for
interoperability and ease of use over
"I\u00F1t\u00EBrn\u00E2ti\u00F4n\u00E0liz\u00E6ti\u00F8n".

Your comment also touches on requirements for serializes. The N-Triples REC
track document places no conformance constants on a serializer, instead it
defines two classes of documents a "canonical N-Triples document" and a
"N-Triple document". Canonical was added specifically to address your
comment regrading the need for a recommended way to write down a given
triple while also meeting the new requirements around internationalization.
At the same time a seralizer that produces Test Cases N-Triples will
produce a conforming N-Triple document.

Please reply to public-rdf-comments@w3.org indicating whether this relational
explains the Working Groups decision to allow and recommend the use of
UTF-8 for N-Triples.

Sincerely,
Gavin Carothers
Received on Friday, 18 October 2013 14:50:40 UTC