- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Sat, 1 Jul 2017 05:33:10 -0700
- To: Sampo Syreeni <decoy@iki.fi>, Jörn Hees <j_hees@cs.uni-kl.de>
- Cc: SW-forum Web <semantic-web@w3.org>
On 07/01/2017 04:11 AM, Sampo Syreeni wrote: > On 2017-06-29, Jörn Hees wrote: > [...] > >>> It might have been better at the beginning to require white space after the >>> subject, predicate, and object of a triple in N-Triples, but given that >>> that wasn't required I don't see that the costs of requiring them now are >>> worth any minor benefits in human readability that might ensue. > > That ain't the problem at all. The problem is that the vocabulary suddenly > isn't well defined for machine consumption. We suddenly don't know whether > with-spaces and without-spaces versions of the same precise N-triples > documents must be accepted by conforming processors; i.e. we suddenly have a > two different views of what Truth is, based on how separate people with their > separate parsers happen to want to see the truth. I completely agree that the biggest problem by far is that there is no current shared view of valid N-Triples documents. This is caused, or at least exacerbated, by the N-Triples spec saying different things in different normative places and by the problems in the grammar in the spec. >>> Note that nothing (except the badly written grammar for N-Triples) prevents >>> tools from putting single spaces after subjects, predicates, and objects. >> >> I agree with the cost consideration, but i never actually saw nt/ttl >> serialized without whitespace. Can you point me to a serializer that does this? > > I'd argue the problem is never with the serializer. It's always with how the > deserializer interprets its data, and how that affects what/how is accepted as > part of the Ground Truth represented by data ingested by and and all software > systems. If your deserializer deems N-Triples data without spaces to be > malformed, then as far as your whole software framework sees the world, > anything seen using without-spaces formatting suddenly became deasserted. > >> IMO this whole discussion is mostly a theoretical one, doesn't really lead >> us anywhere and could easily be concluded: [...] > > I'd argue to the contrary, at least on security grounds. Again, I completely agree that security issues provide a clear and present practical significance here. Well, at least as long as some system is accepting N-Triples documents that come from an external source. >> [...] I wouldn't update the spec prohibiting parsing no whitespace nt, as >> this would be backwards incompatible. > > I'd update the spec, allowing white space. That's what errata are for, and the > result would be backwards compatible. At least intuition-compatible, whatever > that means. The spec says several different things about white space in normative sections. The first erratum should be to make the expository sections of the document non-normative. The second erratum should be to update the grammar, but it is currently unclear just where white space is allowed or required, and this needs to be cleared up first. >> I'd however slightly update the grammar prelude to more explicitly allow >> (and encourage) whitespace after each of the terms (even if not totally >> necessary) by replacing the following sentence: [...] > > I'd rewrite the whole thing to actually make it at the same time both reflect > what was meant, as it does now, and to also be (preferably obviously) part of > a well-known formal grammar family > -- > Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front > +358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 I agree that the grammar should be rewritten, and I would like it to be rewritten in some formal grammar family. However, that's not exactly how most languages are current specified. Instead a two- or three-phase specification is used, with the second phase being BNF plus priority and the first phase being greedy left-to-right tokenization with some post-procesing. The third phase applies some simple context-sensitive processing. The current grammar hints that the two-phase version of this is the actual specification methodology, but doesn't come right out and say so. Peter F. Patel-Schneider Nuance Communications
Received on Saturday, 1 July 2017 12:33:59 UTC