Proposed fixed version of N-Triples https://www.w3.org/TR/n-triples/ Section 7

A message to semantic-web@w3.org
https://lists.w3.org/Archives/Public/semantic-web/2017Jun/0065.html inspired
me to take a closer look at the grammar for N-Triples.  I found a number of
problems in the grammar for N-Triples there.  I propose the following fixed
version of the grammar section.


Problems addressed:
1/ White space permitted but not required between any two terminals and at
beginning and end of document.
2/ Comments can only occur in specific places.
3/ Lines consisting entirely of white space and/or a comment are permitted.
4/ Confusing statement about Unicode code points removed.

Remaining issue:
1/ The grammar in the TR mentions white space in the context of any two
terminals, which includes between the parts of a literals.  However, there
is no example or test case that has white space there.   This grammar
permits white space there.


7. Grammar

An N-Triples document is a Unicode [UNICODE] character string encoded in
UTF-8.
[[Remove: Unicode code points only in the range U+0 to U+10FFFF inclusive are
allowed.  Rationale: These are the only Unicode code points.]]

White space (tab U+0009 or space U+0020) is allowed but not required between
any two terminals.
[[Replace: White space (tab U+0009 or space U+0020) is used to separate two
terminals
which would otherwise be (mis-)recognized as one terminal.
Rationale: In N-Triples there is no possibility of such mis-recognition.]]
White space is significant in the production STRING_LITERAL_QUOTE.

Comments in N-Triples take the form of '#', outside an IRIREF or
STRING_LITERAL_QUOTE, and continue up-to, and excluding, the end of line
(EOL), or end of file if there is no end of line after the comment
marker. Comments are treated as white space.

The EBNF used here is defined in XML 1.0 [EBNF-NOTATION].

[[White space and comments are now explicit in the grammar similar to the
situation in early versions of the N-Triples grammar.  Rationale: Makes it
clear where white space and comments are permitted. ]]

Escape sequence rules are the same as Turtle [TURTLE]. However, as only the
STRING_LITERAL_QUOTE production is allowed new lines in literals MUST be
escaped.
[1]  ntriplesDoc  ::=  triple? (EOL triple)* END
[2]  triple   ::=  WS? subject WS? predicate WS? object WS? '.'
[3]  subject  ::=  IRIREF | BLANK_NODE_LABEL
[4]  predicate  ::=  IRIREF
[5]  object   ::=  IRIREF | BLANK_NODE_LABEL | literal
[6]  literal  ::=  STRING_LITERAL_QUOTE (WS? '^^' WS? IRIREF | WS? LANGTAG)?

Productions for terminals
[144s]  LANGTAG  ::=  '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
[[Lines consisting entirely of white space and/or a comment are now permitted.]]
[7]  EOL  ::=  ( WS? ('#x22' [^#xD#xA]* )? [#xD#xA] )+
[7a] END ::=  EOL? WS? ('#x22' [^#xD#xA]* )?
[8]  IRIREF  ::=  '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
[9]  STRING_LITERAL_QUOTE  ::=  '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
[141s]  BLANK_NODE_LABEL  ::=  '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')*
PN_CHARS)?
[10]  UCHAR  ::=  '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX
[153s]  ECHAR  ::=  '\' [tbnrf"'\]
[157s]  PN_CHARS_BASE  ::=  [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6]
| [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] |
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] |
[#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[158s]  PN_CHARS_U  ::=  PN_CHARS_BASE | '_' | ':'
[160s]  PN_CHARS  ::=  PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] |
[#x203F-#x2040]
[162s]  HEX  ::=  [0-9] | [A-F] | [a-f]

[[White space is included in grammar.]]
 WS ::= [#x9#x20]+

Received on Thursday, 29 June 2017 09:19:15 UTC