- From: David Booth <david@dbooth.org>
- Date: Wed, 06 Nov 2013 10:51:09 -0500
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: public-rdf-comments <public-rdf-comments@w3.org>
On 11/02/2013 07:46 PM, Eric Prud'hommeaux wrote: > * David Booth <david@dbooth.org> [2013-05-08 14:45-0400] >> Regarding >> http://www.w3.org/TR/2013/CR-turtle-20130219/ >> >> As an RDF author I frequently refer to the EBNF syntax rules in >> section 6.5 to check a detail of the Turtle syntax, such as figuring >> out whether a particular character is permitted in a local name. >> For the most part the rules are easy to read. But several of the >> rules specify unicode characters using hexadecimal, such as: >> >> [161s] WS ::= #x20 | #x9 | #xD | #xA >> [163s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | >> [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | >> [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | >> [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | >> [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] >> [166s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | >> [#x0300-#x036F] | [#x203F-#x2040] >> >> Clearly it is necessary for clarity to use the hexadecimal notation >> in the production rules, so I certainly don't object to their use. >> But as a reader, it drives me bananas trying to figure out what >> those hexadecimal characters are -- searching the web, etc. >> >> Please add some simple comments to the production rules, indicating >> what the hexadecimal-encoded characters are, so that readers don't >> have to go searching to figure it out. Something like the following >> would be a big help: >> >> /* See @@ add link to unicode table @@ */ >> /* #x20 = SPACE, #x9 = TAB, #xD = Carriage return, #xA = Line feed */ >> [161s] WS ::= #x20 | #x9 | #xD | #xA >> >> /* #x00B7 = Middle dot, #x0300 = ??? (couldn't find that one) */ >> [166s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | >> [#x0300-#x036F] | [#x203F-#x2040] > > To really help people wondering if some character is permitted in some > terminal, such a listing would have to include the character ranges, > not just the boundaries. As a compromise, I provisionally included > comments for ascii characters, specifically: > > [18] IRIREF ::= '<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>' /* #x00=NULL #01-#x1F=control codes #x20=space */ > [22] STRING_LITERAL_QUOTE ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"' /* #x22=" #x5C=\ #xA=new line #xD=carriage return */ > [23] STRING_LITERAL_SINGLE_QUOTE ::= "'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'" EXPONENT) /* #x27=' #x5C=\ #xA=new line #xD=carriage return */ Excellent! That's a big help. But there seems to be a rendering problem, because when I view the spec draft https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/index.html#sec-grammar in my browser (Firefox 25.0), all the "x"s have disappeared from the hex character codes in the comments, so "#x20" has become "x20", for example: [[ [22] STRING_LITERAL_QUOTE ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"' /* #22=" #5C=\ #A=new line #D=carriage return */ [23] STRING_LITERAL_SINGLE_QUOTE ::= "'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'" EXPONENT) /* #27=' #5C=\ #A=new line #D=carriage return */ ]] Maybe this is a respec problem? David > [161s] WS ::= #x20 | #x9 | #xD | #xA /* #x20=space #x9=character tabulation #xD=carriage return #xA=new line */ > > and will ask if the WG considers this extra text to be a net help. > > If this comment addresses your comment, please reply with the subject > prefixed by "[RESOLVED]". > > >> Thanks, >> David >> >
Received on Wednesday, 6 November 2013 15:51:37 UTC