- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Thu, 29 Jun 2017 13:17:21 -0700
- To: Gregg Kellogg <gregg@greggkellogg.net>
- Cc: Dan Brickley <danbri@google.com>, Ivan Herman <ivan@w3.org>, public-rdf-comments Comments <public-rdf-comments@w3.org>
On 06/29/2017 11:23 AM, Gregg Kellogg wrote: >> On Jun 29, 2017, at 5:48 AM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote: >> >> One problem with providing these test cases is that it is not really possible >> to determine exactly what the current grammar allows. > > Turtle adds some clarity: > >> White space (production WS) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a Turtle parser. I don't view this wording as clear at all. For example, :::. can be recognized as a Turtle triple in only one way so white space shouldn't be needed here. The intent is probably something like using white space to separate tokens for a left-to-right greedy tokenizer for the terminal tokens that appear in the rules for non-terminals. However, there are lots of other ways of writing Turtle parsers. > This language comes from the SPARQL 1.1 grammar, so any change need for N-Triples/Quads/Turtle/TriG would also need to be applied there. > > What it doesn’t do is say that white space may be composed of one or more WS or comments, which it probably should. Comments are treated as white space. > > There is no discussion of white space within terminals, except for the String production and ANON terminal. > > Personally, I’m not a fan if adding explicit WS to the grammar. > > My interpretation for all grammars is that any amount of white space is allowed between any terminals, not required at all in N-Triples/Quads, and necessary in SPARQL/Turtle/TriG to keep two terminals from being confused. > >> For example, there are multiple readings of "terminal". Is it any terminal, >> in which case white space might be allowed within blank node labels? Is it >> any terminal mentioned in the productions for non-terminals, in which case >> blank space might be allowed before language tags? Is it only named terminals >> mentioned in the productions for non-terminals? > > The quote above qualifies the use of white space to be significant in rule names in capitals, which are the "Productions for terminals”. Thus, WS is significant in the STRING_LITERAL_* terminals, and allowed within ANON. Perhaps it should be made clear that white space within other terminals is specifically not allowed. > > It is somewhat problematic that note 6 specifically says that no white space is allowed between the sign and the number, which might imply that white space is allowed, say, between numbers, or following ‘@‘. My parser implements terminals as regular expressions, where white space is explicit within the regular expression, if necessary. The tokenizer eats any amount of whitespace between matched terminals; I think is is probably a fairly common pattern. > > Gregg peter
Received on Thursday, 29 June 2017 20:17:59 UTC