Fixing N-Quads and Turtle from Peter F. Patel-Schneider on 2017-06-29 (public-rdf-comments@w3.org from June 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 29 Jun 2017 06:25:03 -0700
To: Ivan Herman <ivan@w3.org>
Cc: public-rdf-comments@w3.org
Message-ID: <11739d01-e76c-ff58-71c6-651353239d9b@gmail.com>

Fixing the N-Quads and Turtle grammars is harder, or at least requires a
different approach, because white space is required in some places in these
languages.

I think that the grammar has to be stated something like:


A Turtle document is a Unicode[UNICODE] character string encoded in UTF-8
that can be recognized using the standard two-stage process of left-to-right
greedy tokenization followed by context-free parsing augmented with some
context-sensitive constraints.

The first stage turns the sequence of UNICODE code points into a sequence of
tokens using left-to-right greedy tokenization with the following regular
expressions:

....

Note: Because the tokenization is left-to-right and greedy, 0.0 is turned
into a single DECIMAL token not an INTEGER token followed by a DECIMAL
token.

Note: Language tags are not limited to the recognized language tags of ???.
As a consequence, this stage treats strings like "hi"@prefix as a
language-tagged string and not a simple string followed by the start of a
directive.

The second stage takes the token sequence with the WS token removed and
attempts to parse it using the following BNF grammar:

.....

During this stage, the prefix of any prefixed name must be the prefix of a
previous prefixID or sparqlPrefix directive.



N-Quads can use a slightly simpler setup as it doesn't have a
context-sensitive aspect.


peter

Received on Thursday, 29 June 2017 13:25:45 UTC