- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Thu, 29 Jun 2017 06:25:03 -0700
- To: Ivan Herman <ivan@w3.org>
- Cc: public-rdf-comments@w3.org
Fixing the N-Quads and Turtle grammars is harder, or at least requires a different approach, because white space is required in some places in these languages. I think that the grammar has to be stated something like: A Turtle document is a Unicode[UNICODE] character string encoded in UTF-8 that can be recognized using the standard two-stage process of left-to-right greedy tokenization followed by context-free parsing augmented with some context-sensitive constraints. The first stage turns the sequence of UNICODE code points into a sequence of tokens using left-to-right greedy tokenization with the following regular expressions: .... Note: Because the tokenization is left-to-right and greedy, 0.0 is turned into a single DECIMAL token not an INTEGER token followed by a DECIMAL token. Note: Language tags are not limited to the recognized language tags of ???. As a consequence, this stage treats strings like "hi"@prefix as a language-tagged string and not a simple string followed by the start of a directive. The second stage takes the token sequence with the WS token removed and attempts to parse it using the following BNF grammar: ..... During this stage, the prefix of any prefixed name must be the prefix of a previous prefixID or sparqlPrefix directive. N-Quads can use a slightly simpler setup as it doesn't have a context-sensitive aspect. peter
Received on Thursday, 29 June 2017 13:25:45 UTC