Re: [Turtle] Please show/explain hexadecimal-encoded characters in comments

On 11/02/2013 07:46 PM, Eric Prud'hommeaux wrote:
> * David Booth <david@dbooth.org> [2013-05-08 14:45-0400]
>> Regarding
>> http://www.w3.org/TR/2013/CR-turtle-20130219/
>>
>> As an RDF author I frequently refer to the EBNF syntax rules in
>> section 6.5 to check a detail of the Turtle syntax, such as figuring
>> out whether a particular character is permitted in a local name.
>> For the most part the rules are easy to read.  But several of the
>> rules specify unicode characters using hexadecimal, such as:
>>
>> [161s] 	WS 	::= 	#x20 | #x9 | #xD | #xA
>> [163s] 	PN_CHARS_BASE 	::= 	[A-Z] | [a-z] | [#x00C0-#x00D6] |
>> [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] |
>> [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
>> [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] |
>> [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
>> [166s] 	PN_CHARS 	::= 	PN_CHARS_U | '-' | [0-9] | #x00B7 |
>> [#x0300-#x036F] | [#x203F-#x2040]
>>
>> Clearly it is necessary for clarity to use the hexadecimal notation
>> in the production rules, so I certainly don't object to their use.
>> But as a reader, it drives me bananas trying to figure out what
>> those hexadecimal characters are -- searching the web, etc.
>>
>> Please add some simple comments to the production rules, indicating
>> what the hexadecimal-encoded characters are, so that readers don't
>> have to go searching to figure it out.  Something like the following
>> would be a big help:
>>
>> /* See @@ add link to unicode table @@ */
>> /*  #x20 = SPACE, #x9 = TAB, #xD = Carriage return, #xA = Line feed */
>> [161s] 	WS 	::= 	#x20 | #x9 | #xD | #xA
>>
>> /* #x00B7 = Middle dot, #x0300 = ??? (couldn't find that one) */
>> [166s] 	PN_CHARS 	::= 	PN_CHARS_U | '-' | [0-9] | #x00B7 |
>> [#x0300-#x036F] | [#x203F-#x2040]
>
> To really help people wondering if some character is permitted in some
> terminal, such a listing would have to include the character ranges,
> not just the boundaries. As a compromise, I provisionally included
> comments for ascii characters, specifically:
>
> [18] 	IRIREF 	::= 	'<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>' /* #x00=NULL #01-#x1F=control codes #x20=space */
> [22] 	STRING_LITERAL_QUOTE 	::= 	'"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"' /* #x22=" #x5C=\ #xA=new line #xD=carriage return */
> [23] 	STRING_LITERAL_SINGLE_QUOTE 	::= 	"'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'" EXPONENT) /* #x27=' #x5C=\ #xA=new line #xD=carriage return */

Excellent!  That's a big help.  But there seems to be a rendering 
problem, because when I view the spec draft
https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/index.html#sec-grammar
in my browser (Firefox 25.0), all the "x"s have disappeared from the hex 
character codes in the comments, so "#x20" has become "x20", for example:
[[
[22] 	STRING_LITERAL_QUOTE 	::= 	'"' ([^#x22#x5C#xA#xD] | ECHAR | 
UCHAR)* '"' /* #22=" #5C=\ #A=new line #D=carriage return */
[23] 	STRING_LITERAL_SINGLE_QUOTE 	::= 	"'" ([^#x27#x5C#xA#xD] | ECHAR | 
UCHAR)* "'" EXPONENT) /* #27=' #5C=\ #A=new line #D=carriage return */
]]

Maybe this is a respec problem?

David


> [161s] 	WS 	::= 	#x20 | #x9 | #xD | #xA /* #x20=space #x9=character tabulation #xD=carriage return #xA=new line */
>
> and will ask if the WG considers this extra text to be a net help.
>
> If this comment addresses your comment, please reply with the subject
> prefixed by "[RESOLVED]".
>
>
>> Thanks,
>> David
>>
>

Received on Wednesday, 6 November 2013 15:51:37 UTC