W3C home > Mailing lists > Public > semantic-web@w3.org > July 2017

Re: Are spaces allowed between terms in N-Triples 1.1?

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Sat, 1 Jul 2017 05:33:10 -0700
To: Sampo Syreeni <decoy@iki.fi>, Jörn Hees <j_hees@cs.uni-kl.de>
Cc: SW-forum Web <semantic-web@w3.org>
Message-ID: <2d41485f-28b8-ef06-a763-a87dcd0f906d@gmail.com>
On 07/01/2017 04:11 AM, Sampo Syreeni wrote:
> On 2017-06-29, Jörn Hees wrote:
> 

[...]

> 
>>> It might have been better at the beginning to require white space after the
>>> subject, predicate, and object of a triple in N-Triples, but given that
>>> that wasn't required I don't see that the costs of requiring them now are
>>> worth any minor benefits in human readability that might ensue.
> 
> That ain't the problem at all. The problem is that the vocabulary suddenly
> isn't well defined for machine consumption. We suddenly don't know whether
> with-spaces and without-spaces versions of the same precise N-triples
> documents must be accepted by conforming processors; i.e. we suddenly have a
> two different views of what Truth is, based on how separate people with their
> separate parsers happen to want to see the truth.

I completely agree that the biggest problem by far is that there is no current
shared view of valid N-Triples documents.  This is caused, or at least
exacerbated, by the N-Triples spec saying different things in different
normative places and by the problems in the grammar in the spec.

>>> Note that nothing (except the badly written grammar for N-Triples) prevents
>>> tools from putting single spaces after subjects, predicates, and objects.
>>
>> I agree with the cost consideration, but i never actually saw nt/ttl
>> serialized without whitespace. Can you point me to a serializer that does this?
> 
> I'd argue the problem is never with the serializer. It's always with how the
> deserializer interprets its data, and how that affects what/how is accepted as
> part of the Ground Truth represented by data ingested by and and all software
> systems. If your deserializer deems N-Triples data without spaces to be
> malformed, then as far as your whole software framework sees the world,
> anything seen using without-spaces formatting suddenly became deasserted.
> 
>> IMO this whole discussion is mostly a theoretical one, doesn't really lead
>> us anywhere and could easily be concluded: [...]
> 
> I'd argue to the contrary, at least on security grounds.

Again, I completely agree that security issues provide a clear and present
practical significance here.  Well, at least as long as some system is
accepting N-Triples documents that come from an external source.
>> [...] I wouldn't update the spec prohibiting parsing no whitespace nt, as
>> this would be backwards incompatible.
> 
> I'd update the spec, allowing white space. That's what errata are for, and the
> result would be backwards compatible. At least intuition-compatible, whatever
> that means.

The spec says several different things about white space in normative
sections.  The first erratum should be to make the expository sections of the
document non-normative. The second erratum should be to update the grammar,
but it is currently unclear just where white space is allowed or required, and
this needs to be cleared up  first.

>> I'd however slightly update the grammar prelude to more explicitly allow
>> (and encourage) whitespace after each of the terms (even if not totally
>> necessary) by replacing the following sentence: [...]
> 
> I'd rewrite the whole thing to actually make it at the same time both reflect
> what was meant, as it does now, and to also be (preferably obviously) part of
> a well-known formal grammar family
> --
> Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
> +358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

I agree that the grammar should be rewritten, and I would like it to be
rewritten in some formal grammar family.  However, that's not exactly how most
languages are current specified.   Instead a two- or three-phase specification
is used, with the second phase being BNF plus priority and the first phase
being greedy left-to-right tokenization with some post-procesing.  The third
phase applies some simple context-sensitive processing. The current grammar
hints that the two-phase version of this is the actual specification
methodology, but doesn't come right out and say so.

Peter F. Patel-Schneider
Nuance Communications
Received on Saturday, 1 July 2017 12:33:59 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:41:56 UTC