- From: Peter F. Patel-Schneider <peter.patel-schneider@nuance.com>
- Date: Thu, 29 Jun 2017 02:15:55 -0700
- To: public-rdf-comments@w3.org
A message to semantic-web@w3.org https://lists.w3.org/Archives/Public/semantic-web/2017Jun/0065.html inspired me to take a closer look at the grammar for N-Triples. I found a number of problems in the grammar for N-Triples there. I propose the following fixed version of the grammar section. Problems addressed: 1/ White space permitted but not required between any two terminals and at beginning and end of document. 2/ Comments can only occur in specific places. 3/ Lines consisting entirely of white space and/or a comment are permitted. 4/ Confusing statement about Unicode code points removed. Remaining issue: 1/ The grammar in the TR mentions white space in the context of any two terminals, which includes between the parts of a literals. However, there is no example or test case that has white space there. This grammar permits white space there. 7. Grammar An N-Triples document is a Unicode [UNICODE] character string encoded in UTF-8. [[Remove: Unicode code points only in the range U+0 to U+10FFFF inclusive are allowed. Rationale: These are the only Unicode code points.]] White space (tab U+0009 or space U+0020) is allowed but not required between any two terminals. [[Replace: White space (tab U+0009 or space U+0020) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rationale: In N-Triples there is no possibility of such mis-recognition.]] White space is significant in the production STRING_LITERAL_QUOTE. Comments in N-Triples take the form of '#', outside an IRIREF or STRING_LITERAL_QUOTE, and continue up-to, and excluding, the end of line (EOL), or end of file if there is no end of line after the comment marker. Comments are treated as white space. The EBNF used here is defined in XML 1.0 [EBNF-NOTATION]. [[White space and comments are now explicit in the grammar similar to the situation in early versions of the N-Triples grammar. Rationale: Makes it clear where white space and comments are permitted. ]] Escape sequence rules are the same as Turtle [TURTLE]. However, as only the STRING_LITERAL_QUOTE production is allowed new lines in literals MUST be escaped. [1] ntriplesDoc ::= triple? (EOL triple)* END [2] triple ::= WS? subject WS? predicate WS? object WS? '.' [3] subject ::= IRIREF | BLANK_NODE_LABEL [4] predicate ::= IRIREF [5] object ::= IRIREF | BLANK_NODE_LABEL | literal [6] literal ::= STRING_LITERAL_QUOTE (WS? '^^' WS? IRIREF | WS? LANGTAG)? Productions for terminals [144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* [[Lines consisting entirely of white space and/or a comment are now permitted.]] [7] EOL ::= ( WS? ('#x22' [^#xD#xA]* )? [#xD#xA] )+ [7a] END ::= EOL? WS? ('#x22' [^#xD#xA]* )? [8] IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>' [9] STRING_LITERAL_QUOTE ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"' [141s] BLANK_NODE_LABEL ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)? [10] UCHAR ::= '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX [153s] ECHAR ::= '\' [tbnrf"'\] [157s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] [158s] PN_CHARS_U ::= PN_CHARS_BASE | '_' | ':' [160s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] [162s] HEX ::= [0-9] | [A-F] | [a-f] [[White space is included in grammar.]] WS ::= [#x9#x20]+
Received on Thursday, 29 June 2017 09:19:15 UTC