Are spaces allowed between terms in N-Triples 1.1?

Hi Semantic Web community,

Is [1] a valid N-Triples 1.1 statement (notice there are no spaces
between the terms)?  I'm able to find evidence that says "yes" and
able to find evidence that says "no".  I think "yes" is correct here,
but the specification document is able to trick a casual reader into
believing "no", several SotA tools currently implement according to
"no", and the test case is ambiguous (see below).

[1] <x:y><x:y><x:y>.

No, [1] is not valid:

  * The N-Triples 1.1 specification says that "The simplest triple
    statement is a sequence of (subject, predicate, object) terms,
    separated by whitespace and terminated by '.' after each triple."
    Under a certain interpretation of the word 'simplest' this means
    that whitespace is required, because if a statement S that is
    expressed by a triple T can also be expressed by a triple T' which
    only differs from T in that is contains no whitespaces, it follows
    that T is not the simplest triple expressing S.

  * The N-Triples specification defines canonical N-Triples as "The
    whitespace following subject, predicate, and object must be a
    single space, (U+0020)."  This implies that whitespace indeed
    follows the subject, predicate, and object terms ("the whitespace"
    does not refer to the empty string).

  * rdflib 4.2.2 gives an error when parsing [1].

  * SWI-Prolog Semweb library 7.5.10 gives an error when parsing [1].

  * There is a test case called `#minimal_whitespace' which has value
    `rdft:approval rdft:Proposed', and according to the test suite
    vocabulary this means that the test is "proposed but not
    approved", so minimal whitespace is a proposal that is not yet
    part of N-Triples 1.1.

  * It should always be possible to easily split an N-Triple statement
    based on whitespace characters.

Yes, [1] is valid:

  * The N-Triples 1.1 specification clearly states that "triples are a
    sequence of RDF terms representing the subject, predicate and
    object of an RDF Triple. These may be separated by white space
    (spaces U+0020 or tabs U+0009)."

  * Serd 0.26.0 correctly parses [1].

  * Jena 3.0 correctly parses [1].

  * Raptor 2.0.15 correctly parses [1].

  * There is a test case called `#minimal_whitespace' which is of type
    `rdft:TestNTriplesPositiveSyntax'.

  * The N-Triples 1.1 specification says that "White space (tab U+0009
    or space U+0020) is used to separate two terminals which would
    otherwise be (mis-)recognized as one terminal."  This may imply
    that whitespace is not required for separating non-terminals?

What can we do to clear up the situation regarding the use of
whitespace in N-Triples?  I can file bugs for rdflib and SWI-Prolog
semweb.  Can someone improve the specification and/or test case?
Or... am I wrong and and is "no" the correct answer after all?

---
Best regards,
Wouter Beek.

Received on Wednesday, 28 June 2017 15:15:50 UTC