Re: Are spaces allowed between terms in N-Triples 1.1?

Is there any place in N-Triples where inter-terminal white space is
required?  I don't think so.

As triples are self-terminating, white-space is not required after a triple.

Comments consume the remainder of a line but leave any EOL in place.  They
can thus only occur where an EOL is allowed and thus cannot occur inside a
triple.

IRIREFs and STRING_LITERAL_QUOTEs are self-terminating so white space is
never required after them.

Literals with a datatype end with an IRIREF so white space is never required
after them.

Language tags are not self-terminating but the next token after a language
tag has to be a '.' which cannot be part of the language tag.  So white
space is never required after a language tag and thus white space is never
required after a literal with a language tag.

All variants and pieces of literals are covered above so white space is
never required after a literal.

BLANK_NODE_LABELs are not self-terminating but the next token after a
blank node is either an IRIREF or a '.'.  BLANK_NODE_LABELs cannot end with
a '.'.  IRIREFs start with a '<' which cannot be part of a
BLANK_NODE_LABEL.  So white space is never required after a
BLANK_NODE_LABEL.

That covers all the cases, I think, so there is no place where
inter-terminal white space is required in N-Triples.


Peter F. Patel-Schneider
Nuance Communications





On 06/28/2017 08:14 AM, Wouter Beek wrote:
> Hi Semantic Web community,
> 
> Is [1] a valid N-Triples 1.1 statement (notice there are no spaces
> between the terms)?  I'm able to find evidence that says "yes" and
> able to find evidence that says "no".  I think "yes" is correct here,
> but the specification document is able to trick a casual reader into
> believing "no", several SotA tools currently implement according to
> "no", and the test case is ambiguous (see below).
> 
> [1] <x:y><x:y><x:y>.
> 
> No, [1] is not valid:
> 
>   * The N-Triples 1.1 specification says that "The simplest triple
>     statement is a sequence of (subject, predicate, object) terms,
>     separated by whitespace and terminated by '.' after each triple."
>     Under a certain interpretation of the word 'simplest' this means
>     that whitespace is required, because if a statement S that is
>     expressed by a triple T can also be expressed by a triple T' which
>     only differs from T in that is contains no whitespaces, it follows
>     that T is not the simplest triple expressing S.
> 
>   * The N-Triples specification defines canonical N-Triples as "The
>     whitespace following subject, predicate, and object must be a
>     single space, (U+0020)."  This implies that whitespace indeed
>     follows the subject, predicate, and object terms ("the whitespace"
>     does not refer to the empty string).
> 
>   * rdflib 4.2.2 gives an error when parsing [1].
> 
>   * SWI-Prolog Semweb library 7.5.10 gives an error when parsing [1].
> 
>   * There is a test case called `#minimal_whitespace' which has value
>     `rdft:approval rdft:Proposed', and according to the test suite
>     vocabulary this means that the test is "proposed but not
>     approved", so minimal whitespace is a proposal that is not yet
>     part of N-Triples 1.1.
> 
>   * It should always be possible to easily split an N-Triple statement
>     based on whitespace characters.
> 
> Yes, [1] is valid:
> 
>   * The N-Triples 1.1 specification clearly states that "triples are a
>     sequence of RDF terms representing the subject, predicate and
>     object of an RDF Triple. These may be separated by white space
>     (spaces U+0020 or tabs U+0009)."
> 
>   * Serd 0.26.0 correctly parses [1].
> 
>   * Jena 3.0 correctly parses [1].
> 
>   * Raptor 2.0.15 correctly parses [1].
> 
>   * There is a test case called `#minimal_whitespace' which is of type
>     `rdft:TestNTriplesPositiveSyntax'.
> 
>   * The N-Triples 1.1 specification says that "White space (tab U+0009
>     or space U+0020) is used to separate two terminals which would
>     otherwise be (mis-)recognized as one terminal."  This may imply
>     that whitespace is not required for separating non-terminals?
> 
> What can we do to clear up the situation regarding the use of
> whitespace in N-Triples?  I can file bugs for rdflib and SWI-Prolog
> semweb.  Can someone improve the specification and/or test case?
> Or... am I wrong and and is "no" the correct answer after all?
> 
> ---
> Best regards,
> Wouter Beek.
> 

Received on Thursday, 29 June 2017 01:39:23 UTC